Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

This AI can spoof your voice after just three seconds

Artificial intelligence (AI) is having a moment right now, and the wind continues to blow in its sails with the news that Microsoft is working on an AI that can imitate anyone’s voice after being fed a short three-second sample.

The new tool, dubbed VALL-E, has been trained on roughly 60,000 hours of voice data in the English language, which Microsoft says is “hundreds of times larger than existing systems”. Using that knowledge, its creators claim it only needs a small smattering of vocal input to understand how to replicate a user’s voice.

man speaking into phone
Fizkes/Shutterstock

More impressive, VALL-E can reproduce the emotions, vocal tones, and acoustic environment found in each sample, something other voice AI programs have struggled with. That gives it a more realistic aura and brings its results closer to something that could pass as genuine human speech.

Recommended Videos

When compared to other text-to-speech (TTS) competitors, Microsoft says VALL-E “significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity.” In other words, VALL-E sounds much more like real humans than rival AIs that encounter audio inputs that they have not been trained on.

On GitHub, Microsoft has created a small library of samples created using VALL-E. The results are mostly very impressive, with many samples that reproduce the lilt and accent of the speakers’ voices. Some of the examples are less convincing, indicating VALL-E is probably not a finished product, but overall the output is convincing.

Huge potential — and risks

A person conducting a video call on a Microsoft Surface device running Windows 11.
Microsoft/Unsplash

In a paper introducing VALL-E, Microsoft explains that VALL-E “may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.” Such a capable tool for generating realistic-sounding speech raises the specter of ever-more convincing deepfakes, which could be used to mimic anything from a former romantic partner to a prominent international personality.

To mitigate that threat, Microsoft says “it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E.” The company says it will also use its own AI principles when developing its work. Those principles cover areas such as fairness, safety, privacy, and accountability.

VALL-E is just the latest example of Microsoft’s experimentation with AI. Recently, the company has been working on integrating ChatGPT into Bing, using AI to recap your Teams meetings, and grafting advanced tools into apps like Outlook, Word, and PowerPoint. And according to Semafor, Microsoft is looking to invest $10 billion into ChatGPT maker OpenAI, a company it has already plowed significant funds into.

Despite the apparent risks, tools like VALL-E could be especially useful in medicine, for instance, to help people to regain their voice after an accident. Being able to replicate speech with such a small input set could be immensely promising in these situations, provided it is done right. But with all the money being spent on AI — both by Microsoft and others — it’s clear it’s not going away any time soon.

Alex Blake
Alex Blake has been working with Digital Trends since 2019, where he spends most of his time writing about Mac computers…
AI can do a lot of things but it can’t make games — or even play them yet
Claude playing Pokemon on Twitch.

As AI tools improve, we keep getting encouraged to offload more and more complex tasks to them. LLMs can write our emails for us, create presentations, design apps, generate videos, search the internet and summarize the results, and so much more. One thing they're still really struggling with, however, is video games.

So far this year, two of the biggest names in AI (Microsoft and Anthropic) have tried to get their models to generate or play games, and the results are probably a lot more limited than many people expect.

Read more
‘AI-powered’ shopping app alleged to have been human-powered
A smartphone with "shop now" on the display.

You may have occasionally joked about how companies these days seem to be falling over themselves to launch something, anything, that has AI, even just a little bit, somewhere under the hood. That way they can run dazzling ad campaigns that make the product sound like it’s at the cutting-edge, powered by this new-fangled technology that everyone’s talking about.

But one tech founder, Albert Saniger, is now in hot water after being charged with making false claims about his company’s technology after it was found that his "AI-infused" universal shopping app was actually powered by a bunch of people in a Philippines call center.

Read more
DeepSeek readies the next AI disruption with self-improving models
DeepSeek AI chatbot running on an iPhone.

Barely a few months ago, Wall Street’s big bet on generative AI had a moment of reckoning when DeepSeek arrived on the scene. Despite its heavily censored nature, the open source DeepSeek proved that a frontier reasoning AI model doesn’t necessarily require billions of dollars and can be pulled off on modest resources.

It quickly found commercial adoption by giants such as Huawei, Oppo, and Vivo, while the likes of Microsoft, Alibaba, and Tencent quickly gave it a spot on their platforms. Now, the buzzy Chinese company’s next target is self-improving AI models that use a looping judge-reward approach to improve themselves.

Read more