Skip to main content

Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents

baidu
Image used with permission by copyright holder
Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice’s public debut earlier this year, can produce real-time speech that’s nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

That’s leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice.

They key is Deep Voice 2’s ability to identify similarities between hundreds of different speakers to build a working model of a human voice. Then, it autonomously derives unique voices from that model — unlike voice assistants like Apple’s Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn’t require guidance or manual intervention.

Baidu (sign)
Image used with permission by copyright holder

“Give it the right data, and it can learn on [its] own what sort of features are important,” Andrew Gibiansky, a research scientist at Baidu’s Silicon Valley AI Lab, told The Verge.

Baidu isn’t the only company investing in high-quality text-to-speech tech. Google’s WaveNet, a product of the company’s DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices. Adobe’s Project VoCo transcribes human speech to editable text in real time. And Lyrebird, a Canadian AI startup, licenses algorithms that can imitate any voice with just a single minute of sample audio, create one thousand sentences in less than half a second, and can infuse the speech it creates with emotions like anger, sympathy, and stress.

But don’t expect Deep Voice 2 or WaveNet to replace Siri, the Google Assistant, or Amazon’s Alexa anytime soon — AI-powered translation apps require more resources than today’s phones can reasonably supply. But Baidu sees potential in applications like text-to-speech apps and voice-based assistants. “The ability to quickly synthesize multiple human voices will have a huge effect on products such as personal assistants and eBook readers in the future. For example, each character of your eBook could have a unique voice when you listen to the eBook.”

Kyle Wiggers
Former Digital Trends Contributor
Kyle Wiggers is a writer, Web designer, and podcaster with an acute interest in all things tech. When not reviewing gadgets…
One of the biggest Oura Ring competitors just did something huge
The Ultrahuman Ring Air and the Oura Ring, resting on a table.

Ultrahuman, the maker of the Ultrahuman Ring Air, is making its way to U.S. production grounds. The company is setting up a production facility in Indiana, which will mark the first time a smart ring from Ultrahuman will be assembled from scratch on U.S. soil.

“The UltraFactory will offer an end-to-end production capability and is based on the company’s first operational model of such a facility in India,” the company says.

Read more
Best Verizon new customer deals: Galaxy S24, iPhone and more
Verizon logo on a smartphone screen in a dark room and a finger touching it.

If you’re in the market for one of the best phones, or any new phone for that matter, you’re going to need a good carrier. Verizon has long been one of the most popular options, as it boasts one of the most reliable networks in the United States. It offers some of the best cell phone plans out there, and for new customers Verizon also offers some pretty impressive discounts on new phones. In many cases this means you can brand new, recently released phones entirely for free when signing up with Verizon. And that’s the case right now, as we’re currently seeing some of the best Verizon new customer deals we’ve seen. You can pretty easily land a new iPhone, Samsung Galaxy phone, and Google Pixel for free, and we’ve got all of the details on how to do so. If that sounds enticing, read onward and start shopping the best Verizon new customer deals available right now.
Free iPhone SE (3rd Gen)

The 2022 release of the Apple iPhone SE is yours for free when you sign up for a new 5G data plan on Verizon -- no trade-ins required. It's the best small smartphone in our list of the best smartphones with a 4.7-inch Liquid Retina display, but it doesn't sacrifice performance as it's powered by Apple's A15 Bionic chip that's also found in the iPhone 13 line and pre-installed with iOS 15. The latest iPhone SE is equipped with a single 12MP rear sensor and 7MP selfie camera, which are boosted by Apple's software to enable better photographs.

Read more
AirTags range: here’s how far the tracker can reach
An AirTag attached on a keyring

Apple AirTags are a helpful tool for tracking valuable possessions like wallets, keys, luggage, and backpacks. These tags employ various technologies that allow you to track your items from short and long distances using your compatible Apple device, such as an iPhone 15 Plus. You might wonder how far you can track your items with AirTags. It's time to find out.
AirTags range, explained

The range of AirTags varies depending on the method you use to locate them. A Bluetooth connection will work when your AirTags are close to your supported Apple device. Otherwise, Apple's Find My network is utilized. Luckily, you don't have to choose the method because it's selected behind the scenes automatically.

Read more