These amazing audio deepfakes showcase progress of A.I. speech synthesis

By Luke Dormehl June 6, 2020

Visual deepfakes, in which one person’s face is spliced onto another person’s body, are so 2019. Here in 2020, deepfake technology trends have shifted a bit, and now the cool kids are using the technology is to create impressive “soundalike” audio tracks.

Contents

Jay-Z raps ‘We Didn’t Start the Fire’
The queen recites The Sex Pistols
Bill Clinton ponders if ‘Baby Got Back’
Frank Sinatra and Ella Fitzgerald get their ‘La La Land’ on
Nirvana interprets ‘Clint Eastwood’

While these have plenty of scary potential when it comes to fake news and the like, for now, it seems that creators are perfectly happy to use them for more irreverent purposes, such as getting famous figures to perform songs they never had any real involvement with.

Here are five of the weirdest and best — including one made specifically for Digital Trends that you won’t find anywhere else.

Jay-Z raps ‘We Didn’t Start the Fire’

Jay-Z covers "We Didn't Start the Fire" by Billy Joel (Speech Synthesis)

No, this audio deepfake of Jay-Z rapping Billy Joel’s We Didn’t Start the Fire didn’t start any fires when it comes to showcasing this vocal synthesis tech. But, having triggered one of the first legal complaints about its usage (by Jay-Z’s record label), YouTube deepfake audio creator Vocal Synthesis helped raise awareness of these tools for a lot of people.

The vocal reproduction of Jay-Z’s voice isn’t perfect in his unofficial cover of Joel’s 1989 smash hit. But, in the breathy staccato style used by Jay, some of the more awkward vocal glitches are masked pretty well. This is a great showcase of deepfake audio in action: Its strengths, its weaknesses, and its eerie abilities to take a piece of text we immediately associate with one person and turn it into something that sounds convincingly like it came out of someone else’s mouth.

The queen recites The Sex Pistols

Queen Elizabeth II reads "God Save the Queen" by Sex Pistols (Speech Synthesis)

Another Vocal Synthesis creation, Queen Elizabeth II (that’s the current queen) reading the Sex Pistol’s 1977 single God Save the Queen is the kind of brilliant meta-parody the internet does so well. The song’s title is, of course, taken from the national anthem of the same name; repurposed to fit lyrics resentful of the English class system and the idea of a monarchy. The original song was famously banned from broadcast by both the BBC and United Kingdom’s Independent Broadcasting Authority.

The Queen Elizabeth voice synthesis on this particular creation wavers in and out, sounding more like a stitched-together tapestry of different samples than one cohesive reading. But is there anything more punk in its conception than a homemade DIY creation which turns, literally, the voice of authority against itself? Brilliant stuff.

Bill Clinton ponders if ‘Baby Got Back’

Bill Clinton reads "Baby Got Back" by Sir Mix-A-Lot (Speech Synthesis)

He likes big butts and he can’t deny. There’s something of a subgenre among deepfake audio makers of getting former U.S. presidents to lend their instantly recognizable voices to perform an array of musical numbers.

Bill Clinton playing Sir Mix-a-Lot doesn’t do it for you? How about George W. Bush performing 50 Cent’s In Da Club. Or maybe you’d just settle for a medley of former POTUS’s spitting NWA’s F*ck Tha Police? (At least the last two of these are NSFW, although in the age of working from home such things may no longer apply!)

Frank Sinatra and Ella Fitzgerald get their ‘La La Land’ on

Jukebox AI regenerates "city of stars" using Frank Sinatra's voices and music style.

So far, all of these have concentrated on synthesizing vocals only. That’s a good start, but an artist’s voice is just one part of their repertoire. What if you could use deepfake audio technology to not just reproduce a person’s voice, but also to learn their other musical stylings and use this to dream up a whole new piece of music?

This is the basis of Open AI’s Jukebox, a music-generating neural network that generates music — including, in its own words, “rudimentary singing … in a variety of genres and artist styles.” Unsurprisingly, this powerful tool is already being put to work, as evidenced by the above collaboration between Frank Sinatra and Ella Fitzgerald singing City of Stars from 2016’s Oscar-winning movie La La Land. The results aren’t perfect, but they definitely give a taste of where all of this is going.

Nirvana interprets ‘Clint Eastwood’

Top 4 Music Deep Fakes in the Style of Nirvana (sorta) sing Clint Eastwood by Gorillaz

In a piece created especially for Digital Trends, the folks at generative A.I. group Dadabots, CJ Carr and Zack Zukowski, whipped up a deepfake audio of legendary grunge band Nirvana riffing on Clint Eastwood, the 2001 single from the British virtual band Gorillaz.

“We used the pretrained, 5 billion-parameter Jukebox model,” Carr told Digital Trends. “It’s been trained on 7,000-plus bands, including Nirvana’s discography. We ran models on multiple Linux servers, set them to grunge and Nirvana, with the hook from Clint Eastwood as lyrics, then generated 27 different 90-second clips on our V100s, and picked our favorite top four.”

As Carr notes, there is still a degree of human creativity involved because they need to select the best pieces. A lot of the time, Carr said, the music clips sound less like one specific band and more like a generic group in that genre. Nonetheless, it’s pretty fascinating stuff.

“Sometimes it invents its own lyrics, [such as] ‘I got sunshine in my head,’ Carr said. “Sometimes the band goes into a breakdown. It kinda has a mind of its own. The realism and room for its own creativity is astonishing. I feel like we’re just scratching the surface on how to manipulate it.”

Topics

Features

Former Digital Trends Contributor

I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…

Computing

Some of AMD’s best GPUs are now cheaper than ever

AMD RX 7800 XT graphics card on an orange background.

If various leakers are to be believed, all hope of seeing AMD's next-gen RDNA 4 GPUs this year is lost. However, that spells good news for those of us who just want to buy one of the best graphics cards right now. Retailers might be trying to clear out some stock for when RDNA 4 does make it to the market, and it's already apparent. Current-gen AMD GPUs are heavily discounted compared to their initial prices, making it a good time to shop.

Let's start with the RX 7700 XT. The GPU launched with a disadvantage -- at $450, it was overpriced when compared to the $500 RX 7800 XT. Things are much better now, as the RX 7700 XT can be scored for as low as $350 on both and . This is closer to the price it probably should've launched at, but it's still the result of what might be a temporary discount -- only the PowerColor Fighter model is this cheap. Other variants of the RX 7700 XT range from $360 to $500 and above.

Mobile

Amazon quietly launched three new Android tablets with a bunch of AI features

amazon quietly launched three new android tablets fire hd 8 2024 render official

When Amazon launches a new tablet, there's usually quite a bit of fanfare. But this time, the retail giant has quietly unveiled out three new Fire HD 8 tablets: the Fire HD 8 (2024), Fire HD 8 Kids, and Fire HD 8 Kids Pro. These might sound like iterative improvements, but don't be fooled, as Amazon has added some serious oomph to these tablets.

The new Fire HD 8 tablets include a writing assistant that's built into the keyboard. Have you ever written something out and thought it lacked a certain pizazz? The assistant will help tighten up your text. The Amazon Silk browser also now has a feature that will summarize we pages so you get the gist of the message without reading the entire story, while another AI tool lets you create custom background images through a text prompt.

Computing

Windows 11 can now use AI to respond to your text messages

The Phone Link app being used on a phone and laptop screen.

Microsoft has started rolling out a helpful Suggested Replies feature in the Phone Link app that gives users AI-powered text suggestions for quick replies to their messages, the software giant stated in a Support blog post.

The new feature uses Microsoft's Cloud AI models to create short replies to specific messages, resulting in faster response times. It is rolling out in Phone Link version 1.24082.137.0 for Windows 11 24H2 and 23H2. You don't need to be in the Windows Insider Program to try out the feature, but you won't see the Suggest Replies feature on all messages. You'll only see the suggestions when the Phone Link's AI can understand the message.