Audio deepfakes are going to wreak havoc on the recording industry

By Luke Dormehl May 17, 2020

Jay-Z isn’t happy. In fact, the 50-year-old rapper and father of three sounds like he’s flipping out in a way you’ve never heard before. You’d have to go back to Jay during his early-2000s feud with Nas to hear him anywhere close to this incensed. Only this time he’s not rapping. He’s ranting.

Contents

Jay-Z eats some Copypasta
Who owns your voice?
Lawsuits loom in the future

“I will wipe you the f*** out with precision the likes of which has never been seen before on this Earth, mark my f****** words,” Jay says, with the instantly recognizable, staccato Brooklyn voice that has earned him a mountain of Grammy awards and nominations, along with a personal net worth estimated to be in the region of $1 billion. “You think you can get away with saying that s*** to me over the internet? Think again, f*****. As we speak, I am contacting my secret network of spies across the USA, and your IP is being traced right now. … You’re f****** dead, kid! I can be anywhere, anytime, and I can kill you in over seven hundred ways. And that’s just with my bare hands.”

Chris DeGraw/Digital Trends

As celebrity freak-outs go, it’s one of the better ones. Only it isn’t. Well, not exactly. The “recording” of Jay-Z’s voice is not so much a recording as it is a synthesis, made possible by the latest machine learning technologies. What deepfake videos are for images, these new audio deepfakes are for voices.

With enough audio samples to train on, they can carry out an impressively accurate imitation of any individual, even when it comes to pronouncing words or phrases they’ve most likely never uttered.

Jay-Z eats some Copypasta

The text that Jay-Z — or, more accurately, Jay-Z’s voice — is reading is the Navy Seal Copypasta: A parody of the typical internet tough guy braggadocio typically found under YouTube videos, Twitch livestreams, or practically any online comment section. The YouTube channel the video is hosted on, called Vocal Synthesis, boasts more than 46,000 subscribers and also hosts numerous other celebrities speaking the exact same words. Alternatives include the likes of George Carlin, Louis C.K., Bill Burr, Frank Sinatra, Bob Ross, Tucker Carlson, Gilbert Gottfried, and a handful of former U.S. presidents.

Jay-Z raps the Navy Seals Copypasta (Speech Synthesis)

Jay-Z really is upset about the deepfake audio, though. Or, at least, his Roc Nation LLC entertainment agency is. In fact, in a touch of irony for the man who once rapped the lines “I sampled your voice, you was usin’ it wrong,” Roc Nation last month filed copyright strikes against the YouTube uploads on Jay-Z’s behalf. The crime? “Unlawfully [using] an A.I. to impersonate our client’s voice.”

It’s a valid complaint, even if it’s arguably heavy-handed for some content intended to do little more than to raise a wry smile. But it also highlights one of the complex legal questions that could only arise from the age of deepfakes: “Does a person own their own voice?”

It highlights one of the complex legal questions that could only arise from the age of deepfakes.

The answer to this question is unsurprisingly not clear-cut. There’s a couplet from an old Dr. Seuss book, Did I Ever Tell You How Lucky You Are, that might as well explain the relationship between technology and legal regulation. It goes like this: “[Ali Sard] has to mow grass in his uncle’s backyard, and it’s quick growing grass and it grows as he mows it. The faster he mows it the faster he grows it.” In other words, technology changes faster than the law can keep up.

“You and I own our own voices under privacy statute, but protections for the voice of a public figure, while still protected under privacy rights or property rights in their identity, can be murky,” Peter Colin, a technologist for Thomson Reuters and a New York entertainment attorney specializing in right of publicity law, told Digital Trends.

Who owns your voice?

Colin describes it as a “legal minefield” that varies among jurisdictions around the world — and even throughout the United States. Ownership of aspects of your personality, such as your name, voice, or likeness, is not explicitly protected by statute in 28 states, although some have state case law that recognizes protection, Colin said. Some states protect voice, while others protect only names and likenesses. Some protect rights only while a person is alive, while others extend these protections for decades after death. Then there’s the question of fair use for satire and parody.

“A deepfake determined to be a cultural satire may leave Jay-Z unable to prevail in a court of law, but a defamatory use that paints him in a false light that misleads the public or generates a profit for the infringing user may give Jay-Z standing to prevail,” Colin said. “In the U.S., this legal framework is rapidly changing due to deepfakes created for misleading political purposes and for revenge porn, but also due to the advent of social media influencers, as states move to give student-athletes in college sports ability to legally profit off their name image and likeness for the first time, and to better monetize personality rights for living and dead celebrities in today’s entertainment industry.”

Focusing on the entertainment industry implications of audio deepfakes ignores some of the big challenges when it comes to spreading fake news. But it’s also rich territory for future potential lawsuits. Would an A.I.-performed album recorded by a soundalike of Jay-Z be illegal? What if it was clearly satire and given away for free? (And if you don’t think the embryonic stages of this are already happening you obviously don’t know the internet. Cue Jay-Z rapping We Didn’t Start the Fire by Billy Joel.) Music generated by A.I. may not be commonplace today, but as with deepfakes, some of the proof-of-concept demonstrations are getting scarily impressive.

Uptown Funk, but an AI attempts to continuously generate more of the song

“Deepfake soundalikes for entertainment purposes have not really been addressed yet by the courts,” Colin said. “The law has just not caught up to the tech as machine learning tech improves for voice modulation and synthesis. Relevant to any legal analysis here is if the purpose [is[ to present someone in a false light by making the public believe they said something they never said. Does the person creating the soundalike or hiring the soundalike for a voiceover profit off the use? Or is it a satirical portrayal or a transformative use for entertainment?”

Lawsuits loom in the future

There are even currently untested questions about the datasets used to train these audio deepfakes. As Colin points out, a voice itself can’t be copyrighted, but a sound recording of a voice singing a song can be. Is an audio deepfake trained on hours of copyrighted Jay-Z albums a breach of copyright? If so, since the copyrights may be dispersed among multiple record labels and other entities (for instance, an interview recorded for television), there could be a whole lot of potentially aggrieved (and copyright infringed) parties.

A voice itself can’t be copyrighted, but a sound recording of a voice singing a song can be.

As these A.I. tools become ever more sophisticated, these cases are going to shift from hypothetical quandaries to the subject of real legal battles, so expect to see some interesting developments. One thing’s for sure: These are the legal battles of the future. When it comes to the legality of deepfakes, even in this one specialized domain, there’s plenty of complexity to delve into. Lawyers are no doubt rubbing their hands together at the prospect.

Provided that they’ve not already been replaced by machines by that point, that is.

Editors' Recommendations

Topics

Features

Contributor

I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…

Social Media

Revenge porn using deepfakes can now result in jail time in Virginia

malwarebytes laptop

Virginia has updated its revenge porn law to make it an offense to share deepfake photos and videos of people without their consent.

Such content uses machine learning to create fake videos that can sometimes look highly realistic. In other words, it can appear as if someone did something they didn’t do. The software used to create deepfakes is growing increasingly sophisticated, making it harder to tell whether or not the material is genuine.

Computing

Why Llama 3 is changing everything in the world of AI

Meta AI on mobile and desktop web interface.

In the world of AI, you've no doubt heard about what OpenAI and Google have been up to. And now, Meta's Llama LLM (large language model) is becoming an increasingly important player in the game, especially with its open-source nature. Meta recently made a big splash with the launch of its Llama 3 AI model, and it's shaken up the field dramatically.

The reasons why are multiple and varied. It's free to use, it has a wide user base, and yes, it's open source, to name but a few. Here's why Llama 3 is taking the AI industry by storm and may shape its future for some time to come.
Llama 3 is really good
We can debate until the cows come home about how useful AIs like ChatGPT and Llama 3 are in the real world -- they're not bad at teaching you board game rules -- but the few benchmarks we have for how capable these AI are give Llama 3 a distinct advantage.

Computing

How to delete messages on your Mac

A MacBook and iPhone in shadow on a surface.

Apple likes to make things easy for its iPhone, iPad, and macOS devotees. When signed in with the same Apple ID on more than one of these devices, you’ll be able to sync your messages from one Apple product to the next. This means when you get a text on your iPhone, you’ll be able to pull it up through the Messages app on your Mac desktop.