Skip to main content

Google Deepmind develops most realistic sounding AI yet

neural networks explain themselves avaexmachina
A24
While subtle, one of the biggest advances portrayed in movies like Her or Ex Machina was that the AI began to really sound like a fellow human. And in the realm of real-life tech, Google’s AI focus in recent years has similarly been to make computers sound more like us. And they’re getting much better at it.

The latest development to come out of Google’s Deepmind AI is called WaveNet and it samples different parts of human speech and models its own waveforms after the way they sound. It’s not perfect yet, but we’re definitely getting closer to voices that sound like they come from a person’s mouth, rather than from a computer’s speaker.

While it still sounds strange, the new AI speech certainly flows better than the kinds of responses you’ll get from Siri or Cortana, which chop up human speech and paste it back together in a way that makes individual pronunciations correct, but the flow of the speech is completely off. (That technique is known as concatenative text to speech, just so you know.)

The WaveNet option flows much better because it uses something called parametric text to speech, which generates it from scratch. Where it differs from traditional uses of that technique though is that Google’s AI models its audio on the waveforms of real human voices.

That’s difficult, because typically there are around 16,000 potential voice samples to be taken with every second of speech — that takes a lot of processing power to handle. To cut back on that, WaveNet uses a prediction engine to estimate what sample should come next in natural speech, using everything that has gone before as a guide.

The results are impressive. To give you a comparison, here’s a classic concatenative text-to-speech system:

That sounds like the sort of digital assistant voices we’re become used to in recent years. But here’s the new WaveNet system that Google has developed:

The cadence of the speech is much more realistic and though there is a general fuzziness to the audio, it’s not hard to imagine that being cleaned up post development.

The process can even be used to simulate different kinds of voices, for example, male and female:

The only problem now is that even though its predictions reduce the amount of required processing for this technique, it still takes too much processing to imagine standard smartphone hardware being capable of doing it in real time. At least for now.

For more information on these techniques, Google’s blog post offers a lot more detail and samples and it even posted a couple of papers on it here.

Editors' Recommendations

Jon Martindale
Jon Martindale is the Evergreen Coordinator for Computing, overseeing a team of writers addressing all the latest how to…
Google Bard could soon become your new AI life coach
Google Bard on a green and black background.

Generative artificial intelligence (AI) tools like ChatGPT have gotten a bad rep recently, but Google is apparently trying to serve up something more positive with its next project: an AI that can offer helpful life advice to people going through tough times.

If a fresh report from The New York Times is to be believed, Google has been testing its AI tech with at least 21 different assignments, including “life advice, ideas, planning instructions and tutoring tips.” The work spans both professional and personal scenarios that users might encounter.

Read more
Google Bard can now speak, but can it drown out ChatGPT?
Google Bard on a green and black background.

In the world of artificial intelligence (AI) chatbots, OpenAI’s ChatGPT is undoubtedly the best known. But Google Bard is hot on its heels, and the bot has just been granted a new ability: the power of speech.

The change was detailed in a Google blog post, which described the update as “Bard’s biggest expansion to date.” It grants Bard not just speech, but the ability to converse in over 40 languages, use images as prompts, and more.

Read more
All of the internet now belongs to Google’s AI
ChatGPT versus Google on smartphones.

Google's latest update to its privacy policy will make it so that the company has free range to scrape the web for any content that can benefit building and improving its AI tools.

“Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public,” the new Google policy says. “For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

Read more