Skip to main content

Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents

baidu
Image used with permission by copyright holder
Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice’s public debut earlier this year, can produce real-time speech that’s nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

That’s leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice.

They key is Deep Voice 2’s ability to identify similarities between hundreds of different speakers to build a working model of a human voice. Then, it autonomously derives unique voices from that model — unlike voice assistants like Apple’s Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn’t require guidance or manual intervention.

Baidu (sign)
Image used with permission by copyright holder

“Give it the right data, and it can learn on [its] own what sort of features are important,” Andrew Gibiansky, a research scientist at Baidu’s Silicon Valley AI Lab, told The Verge.

Baidu isn’t the only company investing in high-quality text-to-speech tech. Google’s WaveNet, a product of the company’s DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices. Adobe’s Project VoCo transcribes human speech to editable text in real time. And Lyrebird, a Canadian AI startup, licenses algorithms that can imitate any voice with just a single minute of sample audio, create one thousand sentences in less than half a second, and can infuse the speech it creates with emotions like anger, sympathy, and stress.

But don’t expect Deep Voice 2 or WaveNet to replace Siri, the Google Assistant, or Amazon’s Alexa anytime soon — AI-powered translation apps require more resources than today’s phones can reasonably supply. But Baidu sees potential in applications like text-to-speech apps and voice-based assistants. “The ability to quickly synthesize multiple human voices will have a huge effect on products such as personal assistants and eBook readers in the future. For example, each character of your eBook could have a unique voice when you listen to the eBook.”

Kyle Wiggers
Former Digital Trends Contributor
Kyle Wiggers is a writer, Web designer, and podcaster with an acute interest in all things tech. When not reviewing gadgets…
The camera on this Android phone is confusing, but I love it
The back of the Tecno Camon 30 Premier.

I’m all for a lot of detail, and love to hear about the new technology that’s inside a smartphone I’m about to test, but when I have to search for an explanation of what something means, it’s not a good start. The Tecno Camon 30 Premier suffers from this problem, as it has a lot of cool camera tech that is explained in a mystifying way.

So, I thought the best thing to do was to just ignore the tech speak and find out if it takes great photos the old-fashioned way.
What's the problem?

Read more
The 5 best phones with IR blasters in 2024
The OnePlus 12's camera module.

IR blasters used to be a common component in smartphones, with big products from Samsung, OnePlus, and TCL giving users access to the cool gadget. Phones equipped with IR blasters could be used as a universal remote for your other electronics, making it easy to control your gear without the need for their default controller (which might be clunky and unintuitive to use).

Fast forward today, and attempting to find a smartphone with an IR blaster is shockingly difficult. What was once common technology is now relegated to just a handful of smartphones. You won't find any iPhones or Galaxy phones with IR blasters, but that doesn't mean you have to settle for a poorly reviewed smartphone if you're interested in the tech. You will, however, probably need to settle for either OnePlus or Xiaomi, as they're the two key players still churning out powerful smartphones equipped with IR blasters.

Read more
Why you should buy the iPhone 15 Pro instead of the iPhone 15 Pro Max
Natural Titanium iPhone 15 Pro with Chopper and BD-1 droids around it.

Apple releases multiple iPhones every year, offering folks choice in terms of size and features. In 2024, the iPhone 15 lineup includes four distinct models.

The regular iPhone 15 and iPhone 15 Plus are great for those who don’t need a telephoto lens and don’t care about the Action button or the 1TB of storage. But anyone who wants a more “pro” experience has the iPhone 15 Pro and iPhone 15 Pro Max.

Read more