Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

This AI can spoof your voice after just three seconds

Artificial intelligence (AI) is having a moment right now, and the wind continues to blow in its sails with the news that Microsoft is working on an AI that can imitate anyone’s voice after being fed a short three-second sample.

The new tool, dubbed VALL-E, has been trained on roughly 60,000 hours of voice data in the English language, which Microsoft says is “hundreds of times larger than existing systems”. Using that knowledge, its creators claim it only needs a small smattering of vocal input to understand how to replicate a user’s voice.

man speaking into phone
Fizkes/Shutterstock

More impressive, VALL-E can reproduce the emotions, vocal tones, and acoustic environment found in each sample, something other voice AI programs have struggled with. That gives it a more realistic aura and brings its results closer to something that could pass as genuine human speech.

When compared to other text-to-speech (TTS) competitors, Microsoft says VALL-E “significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity.” In other words, VALL-E sounds much more like real humans than rival AIs that encounter audio inputs that they have not been trained on.

On GitHub, Microsoft has created a small library of samples created using VALL-E. The results are mostly very impressive, with many samples that reproduce the lilt and accent of the speakers’ voices. Some of the examples are less convincing, indicating VALL-E is probably not a finished product, but overall the output is convincing.

Huge potential — and risks

A person conducting a video call on a Microsoft Surface device running Windows 11.
Microsoft/Unsplash

In a paper introducing VALL-E, Microsoft explains that VALL-E “may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.” Such a capable tool for generating realistic-sounding speech raises the specter of ever-more convincing deepfakes, which could be used to mimic anything from a former romantic partner to a prominent international personality.

To mitigate that threat, Microsoft says “it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E.” The company says it will also use its own AI principles when developing its work. Those principles cover areas such as fairness, safety, privacy, and accountability.

VALL-E is just the latest example of Microsoft’s experimentation with AI. Recently, the company has been working on integrating ChatGPT into Bing, using AI to recap your Teams meetings, and grafting advanced tools into apps like Outlook, Word, and PowerPoint. And according to Semafor, Microsoft is looking to invest $10 billion into ChatGPT maker OpenAI, a company it has already plowed significant funds into.

Despite the apparent risks, tools like VALL-E could be especially useful in medicine, for instance, to help people to regain their voice after an accident. Being able to replicate speech with such a small input set could be immensely promising in these situations, provided it is done right. But with all the money being spent on AI — both by Microsoft and others — it’s clear it’s not going away any time soon.

Editors' Recommendations

Alex Blake
In ancient times, people like Alex would have been shunned for their nerdy ways and strange opinions on cheese. Today, he…
Google Bard could soon become your new AI life coach
Google Bard on a green and black background.

Generative artificial intelligence (AI) tools like ChatGPT have gotten a bad rep recently, but Google is apparently trying to serve up something more positive with its next project: an AI that can offer helpful life advice to people going through tough times.

If a fresh report from The New York Times is to be believed, Google has been testing its AI tech with at least 21 different assignments, including “life advice, ideas, planning instructions and tutoring tips.” The work spans both professional and personal scenarios that users might encounter.

Read more
Amazon expands use of generative AI to summarize product reviews
An AI-generated review highlight on Amazon's website.

Amazon is rolling out the use of generative-AI technology to summarize customer product reviews on its shopping site.

It follows several months of testing the feature, which is designed to help speed up the shopping experience for those who don’t want to spend a long time trawling through endless reviews.

Read more
AI can now steal your passwords with almost 100% accuracy — here’s how
A digital depiction of a laptop being hacked by a hacker.

Researchers at Cornell University have discovered a new way for AI tools to steal your data -- keystrokes. A new research paper details an AI-driven attack that can steal passwords with up to 95% accuracy by listening to what you type on your keyboard.

The researchers accomplished this by training an AI model on the sound of keystrokes and deploying it on a nearby phone. The integrated microphone listened for keystrokes on a MacBook Pro and was able to reproduce them with 95% accuracy -- the highest accuracy the researchers have seen without the use of a large language model.

Read more