This startup wants to deepfake clone your voice and sell it to the highest bidder

By Luke Dormehl May 26, 2021

There’s a video that pops up periodically on my YouTube feed. It’s a conversation between rappers Snoop Dogg and 50 Cent bemoaning the fact that, compared to their generation, all modern hip-hop artists apparently sound the same. “When a person decides to be themselves, they offer something no-one else can be,” says 50 Cent. “Yeah, ‘cos once you be you — who can be you but you?” Snoop responds.

Contents

“We can repurpose a lot”
How will the public react?
Think about the future

Snoop Dogg impersonates today's rappers sound-alike flow

When the video was uploaded in October 2014, that may have broadly been true. But just a few years later it certainly isn’t. In a world of audio deepfakes, it’s possible to train an A.I. to sound eerily similar to another person by feeding it an audio corpus consisting of hours of their spoken data. The results are unnervingly accurate.

Public figures like the rapper Jay-Z and the psychologist Jordan Peterson have already complained about people misappropriating their voices by creating audio deepfakes and then making them say silly things on the internet. “Wake up,” wrote Peterson. “The sanctity of your voice, and your image, is at serious risk.” Those are just the mischievous cases. In others, the results can tip over into un-nuanced criminality. In one 2019 incident, criminals used an audio deepfake to impersonate the voice of the CEO of an energy company and persuade an underling over the phone to urgently transfer $243,000 to a bank account.

Veritone, an A.I. company that creates smart tools for labeling media for the entertainment industry, is putting the audio deepfake power back in the hands (or, err, the throats) of those to whom it rightly belongs. This month, the company announced Marvel.ai, what company president Ryan Steelberg described to Digital Trends as a “complete voice-as-a-service solution.” For a fee, Veritone will build an A.I. model that sounds just like you (or, more likely, a famous person with an immediately recognizable voice), which can then be licensed out on loan like a high-tech version of Ariel’s voice-as-collateral bargain from The Little Mermaid.

Synthetic Voice by MARVEL.ai

“Your voice is just as valuable as any other content or brand attribute that you have,” said Steelberg. “[It’s on a level with] your name and likeness, your face, your signature, or a song you’ve written or piece of content you’ve created.”

“We can repurpose a lot”

Certain individuals have, of course, long sold their voices in the form of recording commercials or voiceovers, singing songs, and countless other forms of monetization. But these endeavors all required the person to actually say the words. What Veritone’s solution promises to do is to make this individually scalable.

What if, for instance, it was possible for Kevin Hart to license his voice out to a luxury brand that could then use it to create personalized ads featuring the name of the viewer, the location of their nearest brick-and-mortar sales outlet, and the particular product they could be most likely to buy? Rather than spending literally days in the recording booth, A.I. could allow this to be done with little more (on Hart’s part, at least) than signing on the dotted line to agree for his voice likeness to be harnessed by said third party. While he was off shooting a movie, or doing a comedy tour, or taking a vacation, or even sleeping, his digital voice could be raking in the cash.

“We can repurpose a lot,” Steelberg explained, regarding the training process. “People who are already speaking a ton, if they’re producing a podcast or in the media, there’s a lot of data out there. We probably have a ton of it already if they happen to be a customer of ours.”

“What we find so fascinating about this new category of A.I. is the extensibility and the variability.”

Steelberg said that the voice-as-a-service idea occurred to Veritone several years ago. However, at the time he was unconvinced that machine learning models were able to create the hyper-realistic synthetic voices he was looking for. This is especially important when it comes to voices we know intimately, even if we’ve never actually met the speaker in question. The results could be some kind of audible uncanny valley, with every wrong sound alerting listeners to the fact that they’re listening to a fake. But here in 2021 he is convinced that things have advanced to the point where this is now possible. Hence Marvel.ai.

Steelberg speaks in excited buzzwords about the massive potential of the technology, talking up its possible plethora of “modalities of execution.” Veritone can create models for text-to-speech. It can also build models for speech-to-speech, whereby a voice actor can “drive” a vocal performance by reading the words with suitable inflection and then having the finished voice overlaid at the end like a Snapchat filter. The company can also fingerprint each voice so it can tell if a piece of apparently real audio that pops up someplace was created using its technology.

“The more you think about it … you’ll literally come up with 50 more [possible use-cases],” he said. “What we find so fascinating about this new category of A.I. is the extensibility and the variability.”

Consider some others. A famous athlete might be a god on the basketball court, but a devil when it comes to reading lines in a script in a way that sounds natural. Using Veritone’s technology, their part in video game cutscenes or reading an audio book of their memoir (which they may also not have written) could be performed by a voice actor, which is then digitally tweaked to sound like the athlete. As another possibility, a movie could be translated for other countries with the same actor voice now reading the lines in French, Mandarin, or any other one of a number of languages, even if the actor doesn’t actually speak them.

How will the public react?

Image used with permission by copyright holder

A big question hanging over all of this, of course, is how members of the public are going to respond to it all. This is the tricky, unpredictable bit. Celebrities today must play a complex role: Both larger-than-life figures worthy of having their face plastered on billboards, and also relatable individuals who have relationship problems, tweet about watching TV in their pajamas, and make silly faces when they eat hot sauce.

What happens, then, when ads appear that not only feature a celebrity reading lines, but in cases when we know that said performer never actually said those lines, but rather had their voice programmatically utilized to bring us a targeted ad? Steelberg said that it is little different to a celebrity handing over control of their social media to a third party account manager. If we see Taylor Swift tweet, we know that it’s quite possibly not Taylor herself tapping out the message, especially if it’s an endorsement or piece of promotional content.

But voice is, in a very real way, different, precisely because it’s more personal. Especially if it’s accompanied by a degree of personalization, which is one of the use-cases that makes the most sense. The truth is that, to quote the screenwriter William Goldman, nobody knows what the public response will be — precisely because nobody has done exactly this before.

“It’s going to run the spectrum, right?” Steelberg said. “[Some] people are going to say, ‘I’m going to use this tool a little bit to augment my day to help me save time.’ Others are going to say, full-blown, ‘I want my voice everywhere to extend my brand, and I’m going to license it out.’”

His best guess is that acceptance will be on a case-by-case basis. “You need to be in tune with the reaction of your audience, and if you see things are working or not working,” he said. “They may love it. They may say, ‘You know what? I love the fact that you’re putting out 10 times more content or more personal content to me, even though I know you used synthetic content to augment it. Thank you. Thank you.’”

Think about the future

Veritone MARVEL.ai — Veritone

As for the future? Steelberg said that “We want to work with all the major talent agencies. We think anybody who is in the business of making money around a scarce brand should be thinking about their voice strategy.”

And don’t expect it to remain purely about audio, either. “We’ve always been fascinated by the potential of using synthetic content to either extend, augment, or potentially completely replace some of the legacy forms of content production,” he continued. “Be that in an audio sense or, ultimately in the future, a video sense.”

That’s right: Once it has cornered the market in the world of audio deepfakes, Veritone plans to go one step further and enter the world of fully realized virtual avatars that both sound and look indistinguishable from their source.

Suddenly those personalized ads from Minority Report sound a whole lot less like science fiction.

Editors' Recommendations

Topics

Contributor

I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…

News

Descript’s new podcast editor can deepfake your voice to dub over errors

Descript Podcast Studio

Deepfakes are moving past videos -- Descript's new podcast editor has the ability to deepfake your own voice to fix errors you made during recording.

Descript Podcast Studio launched Wednesday with a feature called “Overdub,” which allows you to replace recorded words and phrases with synthesized speech that blends in with the other audio. The feature is meant to fill in the blanks of any mistakes or awkward pauses during the recording of a podcast. Right now, the Overdub feature is only available in a closed beta.

Emerging Tech

The best deepfakes on the web: Baby Elon, Ryan Reynolds Wonka, and beyond

Ryan Reynolds as Willy Wonka Deepfake from NextFace

Deepfakes, the A.I.-aided face-swapping technology that threatens the future of truth as we know it, are everywhere. But while some of the potential applications are pretty darn unnerving, some are just plain fun as well.

Ever since the tech first burst onto the scene, a burgeoning community of deepfake creators has assembled online. Due to the controversial nature of the technology, many of these creators weren’t willing to share their real names. But share their work and thoughts on said work? That’s another thing entirely.

Emerging Tech

Why tech companies are ill-equipped to combat the internet’s deepfake problem

A deepfake of Mark Zuckerberg

How do you solve a problem like deepfake? It’s a question that everyone from tech companies to politicians are having to ask with the advent of new, increasingly accessible tools that allow for the creation of A.I. manipulated videos in which people’s likenesses are reappropriated in once unimaginable ways.

Such videos are sometimes created for satirical or sometimes darkly comedic purposes. Earlier this year, a deepfake video showed CEO Mark Zuckerberg gleefully boasting about his ownership of user data. A PSA about fake news, ventriloquized by Jordan Peele, meanwhile depicted Barack Obama calling his Presidential successor a “total and complete dipshit.” With the 2020 Presidential elections looming on the horizon, there’s more concern than ever about how deepfakes could be abused to help spread mistruth.