Skip to main content

Google Deepmind develops most realistic sounding AI yet

While subtle, one of the biggest advances portrayed in movies like Her or Ex Machina was that the AI began to really sound like a fellow human. And in the realm of real-life tech, Google’s AI focus in recent years has similarly been to make computers sound more like us. And they’re getting much better at it.

The latest development to come out of Google’s Deepmind AI is called WaveNet and it samples different parts of human speech and models its own waveforms after the way they sound. It’s not perfect yet, but we’re definitely getting closer to voices that sound like they come from a person’s mouth, rather than from a computer’s speaker.

Recommended Videos

While it still sounds strange, the new AI speech certainly flows better than the kinds of responses you’ll get from Siri or Cortana, which chop up human speech and paste it back together in a way that makes individual pronunciations correct, but the flow of the speech is completely off. (That technique is known as concatenative text to speech, just so you know.)

The WaveNet option flows much better because it uses something called parametric text to speech, which generates it from scratch. Where it differs from traditional uses of that technique though is that Google’s AI models its audio on the waveforms of real human voices.

That’s difficult, because typically there are around 16,000 potential voice samples to be taken with every second of speech — that takes a lot of processing power to handle. To cut back on that, WaveNet uses a prediction engine to estimate what sample should come next in natural speech, using everything that has gone before as a guide.

The results are impressive. To give you a comparison, here’s a classic concatenative text-to-speech system:

That sounds like the sort of digital assistant voices we’re become used to in recent years. But here’s the new WaveNet system that Google has developed:

The cadence of the speech is much more realistic and though there is a general fuzziness to the audio, it’s not hard to imagine that being cleaned up post development.

The process can even be used to simulate different kinds of voices, for example, male and female:

The only problem now is that even though its predictions reduce the amount of required processing for this technique, it still takes too much processing to imagine standard smartphone hardware being capable of doing it in real time. At least for now.

For more information on these techniques, Google’s blog post offers a lot more detail and samples and it even posted a couple of papers on it here.

Jon Martindale
Jon Martindale is a freelance evergreen writer and occasional section coordinator, covering how to guides, best-of lists, and…
I saw Google’s Gemini AI erase copyright evidence. I am deeply worried
Gemini Advanced on the Google Pixel 9 Pro Fold.

Update: Google has responded to Digital Trends’ queries. The story has been updated with company’s statement below.
The rise of generative AI has been a fairly messy process, especially from fair usage ethics and copyright perspective. AI giants are inking deals with publishers to avoid legal hassles, while at the same time, they are embroiled in copyright tussles in courts in multiple countries.
As the ravenous appetite for training AI on user data grows, we might be in for another ethical conundrum. Multiple users on X and Reddit have shared demonstrations of how Google’s latest Gemini 2.0 series AI model can remove watermarks from copyright-protected images.
Going by the before/after samples of images, it seems Gemini is doing a fairly good job of removing the watermarks. Notably, it’s not only erasing those banner-style watermarks, but also fairly complex overlays with design and stylized text elements.
https://x.com/ajiteshleo/status/1901484196972945872?s=61
The model in question is the Gemini 2.0 Flash, which was released earlier this year and got a reasoning upgrade, as well. It is worth noting here that you can not remove the watermark if you are using the mobile or desktop version of the Gemini 2.0 Flash model. Trying to do so returns a message like this:
“Removing watermarks from images without the explicit permission of the copyright holder is illegal in most countries. It is important to respect the copyright laws and intellectual property rights. If you want to use an image with a watermark, you should contact the copyright holder and ask for permission.”
You can, however, try and remove the watermark from images in the Google AI Studio. Digital Trends successfully removed watermarks from a variety of images using the Gemini 2.0 Flash (Image Generation) Experimental model.
 
It is a violation of local copyright laws and any usage of AI-modified material without due consent could land you in legal trouble. Moreover, it is a deeply unethical act, which is also why artists and authors are fighting in court over companies using their work to train AI models without duly compensating them or seeking their explicit nod.

How are the results?
A notable aspect is that the images produced by the AI are fairly high quality. Not only is it removing the watermark artifacts, but also fills the gap with intelligent pixel-level reconstruction. In its current iteration, it works somewhat like the Magic Eraser feature available in the Google Photos app for smartphones.
Furthermore, if the input image is low quality, Gemini is not only wiping off the watermark details but also upscaling the overall picture. .
https://x.com/kaiju_ya/status/1901099096930496720?s=61
The output image, however, has its own Gemini watermark, although this itself can be removed with a simple crop. There are a few minor differences in the final image produced by Gemini after its watermark removal process, such as slightly different color temperatures and fuzzy surface details in photorealistic shots.

Read more
Google is giving free access to two of Gemini’s best AI features
Gemini Advanced on the Google Pixel 9 Pro Fold.

Google’s Gemini AI has steadily made its way to the best of its software suite, from native Android integrations to interoperability with Workspace apps such as Gmail and Docs. However, some of the most advanced Gemini features have remained locked behind a subscription paywall.
That changes today. Google has announced that Gemini Deep Research will now be available for all users to try, alongside the ability to create custom Gem bots. You no longer need a Gemini Advanced (or Google One AI Premium) subscription to use the aforementioned tools.

The best of Gemini as an AI agent
Deep Research is an agentic tool that takes over the task of web research, saving users the hassle of visiting one web page after another, looking for relevant information. With Deep Research, you can simply put a natural language query as input, and also specify the source, if needed.

Read more
Google’s new Gemma 3 AI models are fast, frugal, and ready for phones
Google Gemma 3 open-source AI model on a tablet.

Google’s AI efforts are synonymous with Gemini, which has now become an integral element of its most popular products across the Worksuite software and hardware, as well. However, the company has also released multiple open-source AI models under the Gemma label for over a year now.

Today, Google revealed its third generation open-source AI models with some impressive claims in tow. The Gemma 3 models come in four variants — 1 billion, 4 billion, 12 billion, and 27 billion parameters — and are designed to run on devices ranging from smartphones to beefy workstations.
Ready for mobile devices

Read more