Google Deepmind develops most realistic sounding AI yet

By Jon Martindale Published September 9, 2016

While subtle, one of the biggest advances portrayed in movies like Her or Ex Machina was that the AI began to really sound like a fellow human. And in the realm of real-life tech, Google’s AI focus in recent years has similarly been to make computers sound more like us. And they’re getting much better at it.

The latest development to come out of Google’s Deepmind AI is called WaveNet and it samples different parts of human speech and models its own waveforms after the way they sound. It’s not perfect yet, but we’re definitely getting closer to voices that sound like they come from a person’s mouth, rather than from a computer’s speaker.

Recommended Videos

While it still sounds strange, the new AI speech certainly flows better than the kinds of responses you’ll get from Siri or Cortana, which chop up human speech and paste it back together in a way that makes individual pronunciations correct, but the flow of the speech is completely off. (That technique is known as concatenative text to speech, just so you know.)

The WaveNet option flows much better because it uses something called parametric text to speech, which generates it from scratch. Where it differs from traditional uses of that technique though is that Google’s AI models its audio on the waveforms of real human voices.

That’s difficult, because typically there are around 16,000 potential voice samples to be taken with every second of speech — that takes a lot of processing power to handle. To cut back on that, WaveNet uses a prediction engine to estimate what sample should come next in natural speech, using everything that has gone before as a guide.

The results are impressive. To give you a comparison, here’s a classic concatenative text-to-speech system:

That sounds like the sort of digital assistant voices we’re become used to in recent years. But here’s the new WaveNet system that Google has developed:

The cadence of the speech is much more realistic and though there is a general fuzziness to the audio, it’s not hard to imagine that being cleaned up post development.

The process can even be used to simulate different kinds of voices, for example, male and female:

The only problem now is that even though its predictions reduce the amount of required processing for this technique, it still takes too much processing to imagine standard smartphone hardware being capable of doing it in real time. At least for now.

For more information on these techniques, Google’s blog post offers a lot more detail and samples and it even posted a couple of papers on it here.

Former Evergreen writer

Jon Martindale covers how to guides, best-of lists, and explainers to help everyone understand the hottest new hardware and…

Topics

Computing

ChatGPT’s hiking advice left two hikers stranded on a mountain in Poland

The chatbot directed the pair onto a climbing route neither had the skills to finish, and it's not the first time AI has sent travelers somewhere they shouldn't have gone.

Bag, Clothing, Coat

A shortcut recommended by ChatGPT left two hikers stuck on a mountain face in Poland this month, and they needed a helicopter to get back down. It's the latest case of an AI chatbot steering travelers toward routes it has no real way to evaluate.

ChatGPT's shortcut led straight to a dead end

Computing

Firefox is doubling its update pace, and that’s good news for your security

Mozilla Firefox

Mozilla is about to speed up one of the most important parts of using Firefox: security updates. If you're used to seeing a new Firefox update land about once a month, that's about to change. Beginning in September, Mozilla plans to switch to a two-week release schedule for Firefox on desktop and Android, meaning users should start getting updates twice as often. That might sound like more frequent downloads, but it's really about closing security gaps sooner.

Why waiting a month for security fixes no longer cuts it

Computing

Anthropic confirms Claude acts differently depending on your language and which model you pick

A new study shows Claude's isn't nearly as consistent as you might assume.

Claude app on iPhone

If you've ever felt like Claude gave you a completely different vibe on one day than another, you weren't imagining it. Anthropic just published research confirming that its chatbot's personality shifts depending on which model you pick and which language you type in, and the pattern is consistent enough that it's worth knowing before you ask your next question.

The model you pick decides how Claude responds