Skip to main content

Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents

Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice’s public debut earlier this year, can produce real-time speech that’s nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

Recommended Videos

That’s leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice.

They key is Deep Voice 2’s ability to identify similarities between hundreds of different speakers to build a working model of a human voice. Then, it autonomously derives unique voices from that model — unlike voice assistants like Apple’s Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn’t require guidance or manual intervention.

Baidu (sign)
Image used with permission by copyright holder

“Give it the right data, and it can learn on [its] own what sort of features are important,” Andrew Gibiansky, a research scientist at Baidu’s Silicon Valley AI Lab, told The Verge.

Baidu isn’t the only company investing in high-quality text-to-speech tech. Google’s WaveNet, a product of the company’s DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices. Adobe’s Project VoCo transcribes human speech to editable text in real time. And Lyrebird, a Canadian AI startup, licenses algorithms that can imitate any voice with just a single minute of sample audio, create one thousand sentences in less than half a second, and can infuse the speech it creates with emotions like anger, sympathy, and stress.

But don’t expect Deep Voice 2 or WaveNet to replace Siri, the Google Assistant, or Amazon’s Alexa anytime soon — AI-powered translation apps require more resources than today’s phones can reasonably supply. But Baidu sees potential in applications like text-to-speech apps and voice-based assistants. “The ability to quickly synthesize multiple human voices will have a huge effect on products such as personal assistants and eBook readers in the future. For example, each character of your eBook could have a unique voice when you listen to the eBook.”

Kyle Wiggers
Former Digital Trends Contributor
Kyle Wiggers is a writer, Web designer, and podcaster with an acute interest in all things tech. When not reviewing gadgets…
Android 16’s stable release is right around the corner
Android 16 logo on Google Pixel 6a held in hand.

Android 16, the next major revamp for Google’s smartphone operating system, has already reached platform stability. And now that all the key changes and upgrades have been locked in place, Google has finally shed light on its public release, sort of. 

During a press briefing for The Android Show ahead of the Google I/O developers conference, a Google executive confirmed that the stable build of Android 16 will land on Pixel phones in June, with a fresh aesthetic makeover in tow.

Read more
iPhone 18 Pro once again tipped for a significant design change
A locked iPhone, showing the lock icon at the top of the screen.

Apple is famous for the sleek, minimal design of its products, but sometimes functionality has to trump minimalism. That's been the case with the iPhone's Face ID system, which provides a very convenient and hands-free way to unlock your phone just by looking at it, but which requires a pill-shaped cutout at the top of the screen to work.

Now, though, it's looking like Apple may ditch the cutout but keep the Face ID unlock option by using detectors placed beneath the screen instead. This under-screen Face ID option would mean that only a small cutout would be required in one corner of the screen for the front-facing camera, so the pill cutout could be removed.

Read more
Samsung Galaxy S25 FE could be in for a major selfie makeover
The Samsung Galaxy S24 FE sitting upright with its display turned on.

Samsung has had a busy year of phone launches so far, not only announcing the Galaxy S25 series at the beginning of the year, followed by the Galaxy A56 in March, but it also very recently released the super slim Galaxy S25 Edge.

We aren't even half way through the year and there are already seven new Samsung Galaxy phones to choose from, but if you thought that was it for 2025, think again. 

Read more