A few weeks ago, a New Orleans-based magician was found to have created AI-synthesized Joe Biden sound bytes, perhaps unwittingly, for an unsavory political operative doing sketchy political things. It was a pretty weird story but, unfortunately, not one that I think will be all that rare in the near future. The line between all AI and human-made creations is rapidly dwindling. Naturally, some people are going to do nefarious things with their ability to easily fabricate sounds and images that are indistinguishable from the real thing to the human eye or ear.
The company that allows anyone to make anyone say anything does have some restrictions in place regarding the barring of content made from politician's soundbytes, buuuuut it's easily circumvented. Actually, it's about as easy to get past ElevenLabs' restrictions as it was to get past TurnItIn's plagiarism checker in grade school. Just put quotes around your essay, turn the quotation marks color to white and hit send! Congrats, you now have a B- average. Or a political attack ad, urging voters to not go to the polls, as was the case with the weird magician story. After the bad press from that write-up, ElevenLabs introduced a "no-go" voices protocol for public figures with a high possibility for misapplication. But the no-go voice restrictions were easily surpassed with a middle school-style loophole, as proven by 404Media.
I clipped about a minute of audio from an interview Biden did with "60 Minutes" that was on YouTube, added about a minute of silence to the start of the file, and uploaded it to ElevenLabs, which instantly created a Biden voice clone. I then was able to make incredibly convincing audio of Biden saying whatever I want, including some pretty heinous things. Adding about a minute of silence to the start of the recording bypassed ElevenLabs' safeguards to instantly create voice clones of Biden, Donald Trump, Taylor Swift, and Gavin Newsom.
Emanuel Maiberg, 404 Media
I wonder, though, where's the line for this? What makes some high profile figures more likely candidates for their synthesized misuse? Why limit the no-go list to just a few heavy hitters? Politicians make sense to prevent election tomfoolery, sure, but wouldn't this apprehension apply to any public figure, or really anyone at all? If you can make anyone say anything flawlessly, surely this is a greater threat to the integrity of all recorded media? But perhaps I'm getting ahead of myself. Wanna discuss this topic in the hallway after the next time our AI ethics 101 class meets?