Skip to main content

Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents

Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice’s public debut earlier this year, can produce real-time speech that’s nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

Related Videos

That’s leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice.

They key is Deep Voice 2’s ability to identify similarities between hundreds of different speakers to build a working model of a human voice. Then, it autonomously derives unique voices from that model — unlike voice assistants like Apple’s Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn’t require guidance or manual intervention.

Baidu (sign)

“Give it the right data, and it can learn on [its] own what sort of features are important,” Andrew Gibiansky, a research scientist at Baidu’s Silicon Valley AI Lab, told The Verge.

Baidu isn’t the only company investing in high-quality text-to-speech tech. Google’s WaveNet, a product of the company’s DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices. Adobe’s Project VoCo transcribes human speech to editable text in real time. And Lyrebird, a Canadian AI startup, licenses algorithms that can imitate any voice with just a single minute of sample audio, create one thousand sentences in less than half a second, and can infuse the speech it creates with emotions like anger, sympathy, and stress.

But don’t expect Deep Voice 2 or WaveNet to replace Siri, the Google Assistant, or Amazon’s Alexa anytime soon — AI-powered translation apps require more resources than today’s phones can reasonably supply. But Baidu sees potential in applications like text-to-speech apps and voice-based assistants. “The ability to quickly synthesize multiple human voices will have a huge effect on products such as personal assistants and eBook readers in the future. For example, each character of your eBook could have a unique voice when you listen to the eBook.”

Editors' Recommendations

Lenovo just killed its Legion gaming phones, and that’s a shame
The Lenovo Legion Phone Duel 2.

There had been some recent rumors that Lenovo would be sunsetting its Legion brand gaming phones, and now it's been officially confirmed.

In a statement to Android Authority, Lenovo confirmed that it will be stopping its current gaming phone efforts as the company restructures its approach to its "gaming portfolio."

Read more
Netflix may bring its gaming service to TVs, with iPhones serving as controllers
The Netflix Games section.

Netflix is working on a feature that will bring its games to smart TVs and allow players to use their iPhone as a controller, reports from Bloomberg and MacRumors' Steve Moser said on Wednesday.

The move would take Netflix’s gaming service beyond only smartphones and tablets, giving subscribers more ways to play. And for iPhone owners, it also eliminates the need to buy a dedicated controller, making it more likely that people would give its games a go.

Read more
I’m excited (and nervous) about this new iPhone 15 Pro rumor
The volume keys on the side of the iPhone 14 Plus.

The rumor mill for the iPhone 15 has been ramping up lately, and if the whispers are true, this would be the biggest overhaul for the iPhone yet. Many of the reports have been circulating around the state of the buttons on the iPhone 15 Pro and iPhone 15 Pro Max, specifically that they’re rumored to have a single volume rocker and that the mute switch will be replaced with a button — all solid-state as well.

The latest rumor to come out, though, is that the iPhone 15 Pro mute switch may not be replaced with a mute button at all. Instead, it could actually be a multi-use Action Button, similar to what's on the Apple Watch Ultra. If this is the case, then it’s going to be one of the biggest changes to iPhone hardware design, with plenty of pros to it ... but also a big con.
An Action Button would benefit everyone

Read more