Experts have long predicted generative artificial intelligence would lead to a tsunami of faked photos and video. What’s emerging is an audio crisis.
By Pranshu Verma and Will Oremus
DAYS before a pivotal national election in Slovakia last month, a seemingly damning audio clip began circulating widely on social media. A voice that sounded like the country’s Progressive party leader, Michal Šimecka, described a scheme to rig the vote, in part by bribing members of the country’s marginalized Roma population.
Two weeks later, another apparent political scandal emerged: The leader of the United Kingdom’s Labour party was seemingly caught on tape berating a staffer in a profanity-laden tirade that was posted on X, formerly Twitter.
Both clips were soon debunked by fact-checkers as likely fakes, with the voices bearing tell-tale signs that they were generated or manipulated by artificial intelligence software. But the posts remain on platforms such as Facebook and X, generating outraged comments from users who assume they are genuine.
Rapid advances in artificial intelligence have made it easy to generate believable audio, allowing anyone from foreign actors to music fans to copy somebody’s voice – leading to a flood of faked content on the web, sewing discord, confusion and anger.
Last week, the actor Tom Hanks warned his social media followers that bad actors used his voice to falsely imitate him hawking dental plans. Over the summer, TikTok accounts used AI narrators to display fake news reports that erroneously linked former president Barack Obama to the death of his personal chef.
On Thursday, a bipartisan group of US senators announced a draft bill, called the No Fakes Act, that would penalise people for producing or distributing an AI-generated replica of someone in an audiovisual or voice recording without their consent.
While experts have long predicted generative artificial intelligence would lead to a tsunami of faked photos and video – creating a disinformation landscape where nobody could trust anything they see – what’s emerging is an audio crisis.
“This is not hypothetical,” said Hany Farid, a professor of digital forensics at the University of California at Berkeley. “You’re talking about violence, you’re talking about stealing of elections, you’re talking about fraud – [this has] real-world consequences for individuals, for societies and for democracies.”
Voice cloning technology has rapidly advanced in the past year, and the proliferation of cheap, easily accessible tools online make it so that almost anyone can launch a sophisticated audio campaign from their bedroom.
It’s difficult for the average person to spot faked audio campaigns, while images and videos still have notable oddities – such as deformed hands, and skewed words.
“Obama still looks a little plasticky when bad actors use his face,” said Jack Brewster, a researcher at NewsGuard, which tracks online misinformation. “But the audio of his voice is pretty good – and I think that’s the big difference here.”
Social media companies also find it difficult to moderate AI-generated audio because human fact-checkers often have trouble spotting fakes. Meanwhile, few software companies have guardrails to prevent illicit use.
Previously, voice cloning software churned out robotic, unrealistic voices. But computing power has grown stronger and the software more refined. The result is technology that can analyse millions of voices, spot patterns in elemental units of speech – called phonemes – and replicate it within seconds.
Online tools, such as from voice cloning software company Eleven Labs, allow almost anyone to upload a few seconds of a person’s voice, type in what they want it to say, and quickly create a deepfaked voice – all for a monthly subscription of $5.
For years, experts have warned that AI-powered “deepfake” videos could be used to make political figures appear to have said or done damaging things. And the flurry of misinformation in Slovakia offered a preview of how that is starting to play out – with AI-generated audio, rather than video or images, playing a starring role.
On Facebook, the audio clip of what sounded like Šimečka and the journalist played over a still image of their respective faces. Both denounced the audio as a fake, and a fact-check by the news agency Agence France-Presse determined it was likely generated wholly or in part by AI tools. Facebook placed a warning label over the video ahead of the September 30 election, noting that it had been debunked. “When content is fact-checked, we label and down-rank it in feed,” Meta spokesman Ryan Daniels said.
But the company did not remove the video, and Daniels said it was deemed not to have violated Facebook’s policies on manipulated media. Facebook’s policy specifically targets manipulated video, but in this case it wasn’t the video that had been altered, just the audio.
Research by Reset, a London-based non-profit that studies social media’s effect on democracy, turned up several other examples of faked audio in the days leading to the election on Facebook, Instagram, Telegram and TikTok. Those included an ad for the country’s far-right Republika party in which a voice that sounds like Šimečka’s says he “used to believe in 70 genders and pregnant men” but now supports Republika. A disclaimer at the end says, “voices in this video are fictional.”
That video appears on Facebook without a fact-check and was promoted on the platform as an ad by a Republika party leader. It racked up between 50,000 and 60,000 views in the three days before the election, according to Facebook’s ad library.
About 3 million people voted in the parliamentary election, with the country’s pro-Russian populist party beating out Šimečka’s Progressive party for the most seats. Slovakia has halted military aid to Ukraine in the election’s wake.
What effect, if any, the AI-generated voice fakes had on the outcome is unclear, said Rolf Fredheim, a data scientist and expert on Russian disinformation who worked with Reset on its research. But the fact that they “spread like wildfire” in Slovakia means the technique is likely to be tried more in future elections across Europe and elsewhere.
Meanwhile, the allegedly faked audio clip of UK Labour leader Keir Starmer, who has a chance to become the next prime minister, remains on X, without any fact check or warning label.
Fears of AI-generated content misleading voters aren’t limited to Europe. On October 5, US Sen. Amy Klobuchar (D-Minn.) and Rep. Yvette D Clarke (D-N.Y.) sent an open letter to the CEOs of Meta and X, expressing “serious concerns about the emerging use” of AI-generated content in political ads on their platforms. The two politicians in May introduced an act to require a disclaimer on political ads that use AI-generated images or video.
European Union Commissioner Thierry Breton pressed Meta chief executive Mark Zuckerberg in a letter on Wednesday to outline what steps his company will take to prevent the proliferation of deepfakes, as countries such as Poland, the Netherlands and Lithuania head to the ballot box in the coming months.
AI-audio generated conspiracy theories are also spreading widely on social media platforms. In September, NewsGuard identified 17 accounts on TikTok that use AI text-to-speech software to generate videos that advance misinformation, and have garnered more than 336 million views and 14.5 million likes.
In recent months, these accounts used AI narrators to create fake news reports that claimed Obama was connected to the death of his personal chef, Tafarin Campbell; TV show personality Oprah Winfrey is a “sex trader”; and that actor Jamie Foxx was left paralysed and blind by the coronavirus vaccine. Only after TikTok was made aware of some of these videos did they take them down, according to NewsGuard.
Ariane de Selliers, a spokeswoman for TikTok, said in a statement that the company “requires creators to label realistic AI-generated content and was the first platform to develop tools to help creators do this, recognising how AI can enhance creativity.”
Brewster, whose company conducted the study and specializes in misinformation, said voice deepfakes present a unique challenge. They don’t show their “glitches” as easily as AI generated videos or images, which often give people oddities such as eight fingers.
Though companies that create AI text-to-voice tools have software to identify whether a voice sample is AI-generated, these systems aren’t widely used by the public.
Voice software has also improved at replicating foreign languages, because of an increased number of data sets with non-English-language audio.
The result is more AI voice deepfake campaigns in countries that may be experiencing war or instability, the experts added. For example, in Sudan, alleged leaked voice recordings of the former leader of the country, Omar al-Bashir, circulated widely on social media platforms, causing confusion among citizens because Bashir is thought to be gravely ill, according to the BBC.
In countries where social media platforms may essentially stand in for the internet, there isn’t a robust network of fact-checkers operating to ensure people know a viral sound clip is a fake, making these foreign language deepfakes particularly harmful.
“We are definitely seeing these audio recordings hitting around the world,” Farid said. “And in those worlds, fact-checking is a much harder business.”
More recently, Harry Styles fans have been thrust into confusion. In June, supposed “leaked” snippets of songs by Styles and One Direction surfaced on the messaging channel Discord, sold to eager fans for sometimes hundreds of dollars each. But several “super fans” quickly dissected the music and argued the songs were AI-generated audio.
The outlet 404 Media conducted its own investigation into the audio and found some samples sounded legitimate and others “sketchy.” Representatives for Harry Styles did not return a request for comment on whether the leaked audio is real or an AI-generated fake.
Farid, of UC Berkeley, said the ultimate responsibility lies with social media companies, because they are responsible for the distribution and amplification of the content.
Though millions of posts are uploaded onto their sites daily, the most savvy disinformation traces back to a handful of profiles with large followings. It’s not in the companies’ interest to remove them, Farid added.
“They could turn the spigot off right now if they wanted to,” he said. “But it’s bad for business.”
– THE WASHINGTON POST