Skip to main content

Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents

baidu
Image used with permission by copyright holder
Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice’s public debut earlier this year, can produce real-time speech that’s nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

That’s leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice.

They key is Deep Voice 2’s ability to identify similarities between hundreds of different speakers to build a working model of a human voice. Then, it autonomously derives unique voices from that model — unlike voice assistants like Apple’s Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn’t require guidance or manual intervention.

Baidu (sign)
Image used with permission by copyright holder

“Give it the right data, and it can learn on [its] own what sort of features are important,” Andrew Gibiansky, a research scientist at Baidu’s Silicon Valley AI Lab, told The Verge.

Baidu isn’t the only company investing in high-quality text-to-speech tech. Google’s WaveNet, a product of the company’s DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices. Adobe’s Project VoCo transcribes human speech to editable text in real time. And Lyrebird, a Canadian AI startup, licenses algorithms that can imitate any voice with just a single minute of sample audio, create one thousand sentences in less than half a second, and can infuse the speech it creates with emotions like anger, sympathy, and stress.

But don’t expect Deep Voice 2 or WaveNet to replace Siri, the Google Assistant, or Amazon’s Alexa anytime soon — AI-powered translation apps require more resources than today’s phones can reasonably supply. But Baidu sees potential in applications like text-to-speech apps and voice-based assistants. “The ability to quickly synthesize multiple human voices will have a huge effect on products such as personal assistants and eBook readers in the future. For example, each character of your eBook could have a unique voice when you listen to the eBook.”

Kyle Wiggers
Former Digital Trends Contributor
Kyle Wiggers is a writer, Web designer, and podcaster with an acute interest in all things tech. When not reviewing gadgets…
Google Pixel 9 Pro Fold: news, rumored price, release date, and more
Official render of the Google Pixel 9 Pro Fold.

Though Samsung is the brand that is best known for foldables in the U.S., that’s changed in recent years, with Google and other brands joining the fray. The Google Pixel Fold was Google’s first foldable, and it had a relatively strong start.

We're expecting a sequel to the first Pixel Fold with the Google Pixel 9 Pro Fold, which we originally thought was going to be called the Pixel Fold 2. Here are all the details we know so far about Google's next foldable.
Google Pixel 9 Pro Fold: release date

Read more
Samsung is starting to lose the foldable race
The cover screen on the Samsung Galaxy Z Flip 6.

Hot on the heels of its Galaxy Unpacked event, Samsung has launched its next generation of foldables with the Galaxy Z Fold 6 and Galaxy Z Flip 6. These new foldables are packed with the latest Qualcomm Snapdragon 8 Gen 3 chip and plenty of RAM, they have powerful camera systems, and they come in a variety of colors.

But in its sixth year of leading the foldable market, Samsung seems to be losing some momentum. After all, the new Galaxy Z Fold 6 and Z Flip 6 have pretty minimal differences from their predecessors, especially in the case of the Flip.

Read more
A new kind of folding phone may take on the iPhone 16 this year
The Huawei Mate Xs being unfolded.

TCL's concept trifold smartphone Corey Gaskin / Digital Trends

Huawei is reportedly preparing to show off a new foldable smartphone that will put the Samsung Galaxy Z Fold 6 and the Google Pixel Fold to shame. How so? Because it has two hinges and perhaps even three screens. It’s being referred to as a trifold device and will apparently fold and unfold in a Z or N shape, making at least three screen orientations possible in a single device.

Read more