Skip to main content

Baidu’s new A.I. can mimic your voice after listening to it for just one minute

We’re not in the business of writing regularly about “fake” news, but it’s hard not to be concerned about the kind of mimicry technology is making possible. First, researchers developed deep learning-based artificial intelligence (A.I.) that can superimpose one person’s face onto another person’s body. Now, researchers at Chinese search giant Baidu have created an A.I. they claim can learn to accurately mimic your voice — based on less than a minute’s worth of listening to it.

“From a technical perspective, this is an important breakthrough showing that a complicated generative modeling problem, namely speech synthesis, can be adapted to new cases by efficiently learning only from a few examples,” Leo Zou, a member of Baidu’s communications team, told Digital Trends. “Previously, it would take numerous examples for a model to learn. Now, it takes a fraction of what it used to.”

Related Videos

Baidu Research isn’t the first to try and create voice-replicating A.I. Last year, we covered a project called Lyrebird, which used neural networks to replicate voices including President Donald Trump and former President Barack Obama with a relatively small number of samples. Like Lyrebird’s work, Baidu’s speech synthesis technology doesn’t sound completely convincing, but it’s an impressive step forward — and way ahead of a lot of the robotic A.I. voice assistants that existed just a few years ago.

The work is based around Baidu’s text-to-speech synthesis system Deep Voice, which was trained on upwards of 800 hours of audio from a total of 2,400 speakers. It needs just 100 5-second sections of vocal training data to sound its best, but a version trained on only 10 5-second samples was able to trick a voice-recognition system more than 95 percent of the time.

“We see many great use cases or applications for this technology,” Zou said. “For example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalized human-machine interfaces. For example, a mom can easily configure an audiobook reader with her own voice. The method [additionally] allows creation of original digital content. Hundreds of characters in a video game would be able to have unique voices because of this technology. Another interesting application is speech-to-speech language translation, as the synthesizer can learn to mimic the speaker identity in another language.”

For a deeper dive into this subject, you can listen to a sample of the voices or read a paper describing the work.

Editors' Recommendations

Google’s LaMDA is a smart language A.I. for better understanding conversation
LaMDA model

Artificial intelligence has made extraordinary advances when it comes to understanding words and even being able to translate them into other languages. Google has helped pave the way here with amazing tools like Google Translate and, recently, with its development of Transformer machine learning models. But language is tricky -- and there’s still plenty more work to be done to build A.I. that truly understands us.
Language Model for Dialogue Applications
At Tuesday’s Google I/O, the search giant announced a significant advance in this area with a new language model it calls LaMDA. Short for Language Model for Dialogue Applications, it’s a sophisticated A.I. language tool that Google claims is superior when it comes to understanding context in conversation. As Google CEO Sundar Pichai noted, this might be intelligently parsing an exchange like “What’s the weather today?” “It’s starting to feel like summer. I might eat lunch outside.” That makes perfect sense as a human dialogue, but would befuddle many A.I. systems looking for more literal answers.

LaMDA has superior knowledge of learned concepts which it’s able to synthesize from its training data. Pichai noted that responses never follow the same path twice, so conversations feel less scripted and more responsively natural.

Read more
Algorithmic architecture: Should we let A.I. design buildings for us?
Generated Venice cities

Designs iterate over time. Architecture designed and built in 1921 won’t look the same as a building from 1971 or from 2021. Trends change, materials evolve, and issues like sustainability gain importance, among other factors. But what if this evolution wasn’t just about the types of buildings architects design, but was, in fact, key to how they design? That’s the promise of evolutionary algorithms as a design tool.

While designers have long since used tools like Computer Aided Design (CAD) to help conceptualize projects, proponents of generative design want to go several steps further. They want to use algorithms that mimic evolutionary processes inside a computer to help design buildings from the ground up. And, at least when it comes to houses, the results are pretty darn interesting.
Generative design
Celestino Soddu has been working with evolutionary algorithms for longer than most people working today have been using computers. A contemporary Italian architect and designer now in his mid-70s, Soddu became interested in the technology’s potential impact on design back in the days of the Apple II. What interested him was the potential for endlessly riffing on a theme. Or as Soddu, who is also professor of generative design at the Polytechnic University of Milan in Italy, told Digital Trends, he liked the idea of “opening the door to endless variation.”

Read more
Emotion-sensing A.I. is here, and it could be in your next job interview
man speaking into phone

I vividly remember witnessing speech recognition technology in action for the first time. It was in the mid-1990s on a Macintosh computer in my grade school classroom. The science fiction writer Arthur C. Clarke once wrote that “any sufficiently advanced technology is indistinguishable from magic” -- and this was magical all right, seeing spoken words appearing on the screen without anyone having to physically hammer them out on a keyboard.

Jump forward another couple of decades, and now a large (and rapidly growing) number of our devices feature A.I. assistants like Apple’s Siri or Amazon’s Alexa. These tools, built using the latest artificial intelligence technology, aren’t simply able to transcribe words -- they are able to make sense of their contents to carry out actions.

Read more