Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

This AI can spoof your voice after just three seconds

Artificial intelligence (AI) is having a moment right now, and the wind continues to blow in its sails with the news that Microsoft is working on an AI that can imitate anyone’s voice after being fed a short three-second sample.

The new tool, dubbed VALL-E, has been trained on roughly 60,000 hours of voice data in the English language, which Microsoft says is “hundreds of times larger than existing systems”. Using that knowledge, its creators claim it only needs a small smattering of vocal input to understand how to replicate a user’s voice.

man speaking into phone
Fizkes/Shutterstock

More impressive, VALL-E can reproduce the emotions, vocal tones, and acoustic environment found in each sample, something other voice AI programs have struggled with. That gives it a more realistic aura and brings its results closer to something that could pass as genuine human speech.

When compared to other text-to-speech (TTS) competitors, Microsoft says VALL-E “significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity.” In other words, VALL-E sounds much more like real humans than rival AIs that encounter audio inputs that they have not been trained on.

On GitHub, Microsoft has created a small library of samples created using VALL-E. The results are mostly very impressive, with many samples that reproduce the lilt and accent of the speakers’ voices. Some of the examples are less convincing, indicating VALL-E is probably not a finished product, but overall the output is convincing.

Huge potential — and risks

A person conducting a video call on a Microsoft Surface device running Windows 11.
Microsoft/Unsplash

In a paper introducing VALL-E, Microsoft explains that VALL-E “may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.” Such a capable tool for generating realistic-sounding speech raises the specter of ever-more convincing deepfakes, which could be used to mimic anything from a former romantic partner to a prominent international personality.

To mitigate that threat, Microsoft says “it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E.” The company says it will also use its own AI principles when developing its work. Those principles cover areas such as fairness, safety, privacy, and accountability.

VALL-E is just the latest example of Microsoft’s experimentation with AI. Recently, the company has been working on integrating ChatGPT into Bing, using AI to recap your Teams meetings, and grafting advanced tools into apps like Outlook, Word, and PowerPoint. And according to Semafor, Microsoft is looking to invest $10 billion into ChatGPT maker OpenAI, a company it has already plowed significant funds into.

Despite the apparent risks, tools like VALL-E could be especially useful in medicine, for instance, to help people to regain their voice after an accident. Being able to replicate speech with such a small input set could be immensely promising in these situations, provided it is done right. But with all the money being spent on AI — both by Microsoft and others — it’s clear it’s not going away any time soon.

Editors' Recommendations

Alex Blake
In ancient times, people like Alex would have been shunned for their nerdy ways and strange opinions on cheese. Today, he…
Steve Wozniak warns AI will make scams even more convincing
steve wozniak tweets wife may be patient zero coronavirus usa speaking 3 2

Steve Wozniak has been sharing his thoughts about the new wave of AI-powered tools that have gained so much attention in recent months.

Speaking to the BBC this week, the Apple co-founder said he fears that the technology will be increasingly used by cybercriminals to make online scams more convincing and therefore harder to spot.

Read more
Protect public from AI risks, White House tells tech giants
A robot holding scales of justice.

At a meeting of prominent tech leaders at the White House on Thursday, vice president Kamala Harris reminded attendees that they have an “ethical, moral, and legal responsibility to ensure the safety and security” of the new wave of generative AI tools that have gained huge attention in recent months.

The meeting is part of a wider effort to engage with advocates, companies, researchers, civil rights organizations, not-for-profit organizations, communities, international partners, and others on important AI issues, the White House said.

Read more
AI could replace around 7,800 jobs at IBM as part of a hiring pause
The ChatGPT website on a laptop's screen as the laptop sits on a counter in front of a black background.

A valid concern that is often brought up in the discourse surrounding AI and automation is the prospect that many jobs could disappear due to being replaced by the new technology. And the latest example of this is the recent news that IBM may include the use of AI and automation in its plans to pause hiring for certain roles within the company.

Bloomberg has reported that among IBM's plans for a hiring pause for certain "back-office functions," IBM could replace approximately 7,800 jobs with AI and automation over a span of five years.

Read more