Skip to main content

Audio deepfakes are going to wreak havoc on the recording industry

Jay-Z isn’t happy. In fact, the 50-year-old rapper and father of three sounds like he’s flipping out in a way you’ve never heard before. You’d have to go back to Jay during his early-2000s feud with Nas to hear him anywhere close to this incensed. Only this time he’s not rapping. He’s ranting.

“I will wipe you the f*** out with precision the likes of which has never been seen before on this Earth, mark my f****** words,” Jay says, with the instantly recognizable, staccato Brooklyn voice that has earned him a mountain of Grammy awards and nominations, along with a personal net worth estimated to be in the region of $1 billion. “You think you can get away with saying that s*** to me over the internet? Think again, f*****. As we speak, I am contacting my secret network of spies across the USA, and your IP is being traced right now. … You’re f****** dead, kid! I can be anywhere, anytime, and I can kill you in over seven hundred ways. And that’s just with my bare hands.”

Chris DeGraw/Digital Trends

As celebrity freak-outs go, it’s one of the better ones. Only it isn’t. Well, not exactly. The “recording” of Jay-Z’s voice is not so much a recording as it is a synthesis, made possible by the latest machine learning technologies. What deepfake videos are for images, these new audio deepfakes are for voices.

With enough audio samples to train on, they can carry out an impressively accurate imitation of any individual, even when it comes to pronouncing words or phrases they’ve most likely never uttered.

Jay-Z eats some Copypasta

The text that Jay-Z — or, more accurately, Jay-Z’s voice — is reading is the Navy Seal Copypasta: A parody of the typical internet tough guy braggadocio typically found under YouTube videos, Twitch livestreams, or practically any online comment section. The YouTube channel the video is hosted on, called Vocal Synthesis, boasts more than 46,000 subscribers and also hosts numerous other celebrities speaking the exact same words. Alternatives include the likes of George Carlin, Louis C.K., Bill Burr, Frank Sinatra, Bob Ross, Tucker Carlson, Gilbert Gottfried, and a handful of former U.S. presidents.

Jay-Z raps the Navy Seals Copypasta (Speech Synthesis)

Jay-Z really is upset about the deepfake audio, though. Or, at least, his Roc Nation LLC entertainment agency is. In fact, in a touch of irony for the man who once rapped the lines “I sampled your voice, you was usin’ it wrong,” Roc Nation last month filed copyright strikes against the YouTube uploads on Jay-Z’s behalf. The crime? “Unlawfully [using] an A.I. to impersonate our client’s voice.”

It’s a valid complaint, even if it’s arguably heavy-handed for some content intended to do little more than to raise a wry smile. But it also highlights one of the complex legal questions that could only arise from the age of deepfakes: “Does a person own their own voice?”

It highlights one of the complex legal questions that could only arise from the age of deepfakes.

The answer to this question is unsurprisingly not clear-cut. There’s a couplet from an old Dr. Seuss book, Did I Ever Tell You How Lucky You Are, that might as well explain the relationship between technology and legal regulation. It goes like this: “[Ali Sard] has to mow grass in his uncle’s backyard, and it’s quick growing grass and it grows as he mows it. The faster he mows it the faster he grows it.” In other words, technology changes faster than the law can keep up.

“You and I own our own voices under privacy statute, but protections for the voice of a public figure, while still protected under privacy rights or property rights in their identity, can be murky,” Peter Colin, a technologist for Thomson Reuters and a New York entertainment attorney specializing in right of publicity law, told Digital Trends.

Who owns your voice?

Colin describes it as a “legal minefield” that varies among jurisdictions around the world — and even throughout the United States. Ownership of aspects of your personality, such as your name, voice, or likeness, is not explicitly protected by statute in 28 states, although some have state case law that recognizes protection, Colin said. Some states protect voice, while others protect only names and likenesses. Some protect rights only while a person is alive, while others extend these protections for decades after death. Then there’s the question of fair use for satire and parody.

“A deepfake determined to be a cultural satire may leave Jay-Z unable to prevail in a court of law, but a defamatory use that paints him in a false light that misleads the public or generates a profit for the infringing user may give Jay-Z standing to prevail,” Colin said. “In the U.S., this legal framework is rapidly changing due to deepfakes created for misleading political purposes and for revenge porn, but also due to the advent of social media influencers, as states move to give student-athletes in college sports ability to legally profit off their name image and likeness for the first time, and to better monetize personality rights for living and dead celebrities in today’s entertainment industry.”

Focusing on the entertainment industry implications of audio deepfakes ignores some of the big challenges when it comes to spreading fake news. But it’s also rich territory for future potential lawsuits. Would an A.I.-performed album recorded by a soundalike of Jay-Z be illegal? What if it was clearly satire and given away for free? (And if you don’t think the embryonic stages of this are already happening you obviously don’t know the internet. Cue Jay-Z rapping We Didn’t Start the Fire by Billy Joel.) Music generated by A.I. may not be commonplace today, but as with deepfakes, some of the proof-of-concept demonstrations are getting scarily impressive.

Uptown Funk, but an AI attempts to continuously generate more of the song

“Deepfake soundalikes for entertainment purposes have not really been addressed yet by the courts,” Colin said. “The law has just not caught up to the tech as machine learning tech improves for voice modulation and synthesis. Relevant to any legal analysis here is if the purpose [is[ to present someone in a false light by making the public believe they said something they never said. Does the person creating the soundalike or hiring the soundalike for a voiceover profit off the use? Or is it a satirical portrayal or a transformative use for entertainment?”

Lawsuits loom in the future

There are even currently untested questions about the datasets used to train these audio deepfakes. As Colin points out, a voice itself can’t be copyrighted, but a sound recording of a voice singing a song can be. Is an audio deepfake trained on hours of copyrighted Jay-Z albums a breach of copyright? If so, since the copyrights may be dispersed among multiple record labels and other entities (for instance, an interview recorded for television), there could be a whole lot of potentially aggrieved (and copyright infringed) parties.

A voice itself can’t be copyrighted, but a sound recording of a voice singing a song can be.

As these A.I. tools become ever more sophisticated, these cases are going to shift from hypothetical quandaries to the subject of real legal battles, so expect to see some interesting developments. One thing’s for sure: These are the legal battles of the future. When it comes to the legality of deepfakes, even in this one specialized domain, there’s plenty of complexity to delve into. Lawyers are no doubt rubbing their hands together at the prospect.

Provided that they’ve not already been replaced by machines by that point, that is.

Editors' Recommendations