Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

This AI can spoof your voice after just three seconds

Artificial intelligence (AI) is having a moment right now, and the wind continues to blow in its sails with the news that Microsoft is working on an AI that can imitate anyone’s voice after being fed a short three-second sample.

The new tool, dubbed VALL-E, has been trained on roughly 60,000 hours of voice data in the English language, which Microsoft says is “hundreds of times larger than existing systems”. Using that knowledge, its creators claim it only needs a small smattering of vocal input to understand how to replicate a user’s voice.

man speaking into phone
Fizkes/Shutterstock

More impressive, VALL-E can reproduce the emotions, vocal tones, and acoustic environment found in each sample, something other voice AI programs have struggled with. That gives it a more realistic aura and brings its results closer to something that could pass as genuine human speech.

When compared to other text-to-speech (TTS) competitors, Microsoft says VALL-E “significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity.” In other words, VALL-E sounds much more like real humans than rival AIs that encounter audio inputs that they have not been trained on.

On GitHub, Microsoft has created a small library of samples created using VALL-E. The results are mostly very impressive, with many samples that reproduce the lilt and accent of the speakers’ voices. Some of the examples are less convincing, indicating VALL-E is probably not a finished product, but overall the output is convincing.

Huge potential — and risks

A person conducting a video call on a Microsoft Surface device running Windows 11.
Microsoft/Unsplash

In a paper introducing VALL-E, Microsoft explains that VALL-E “may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.” Such a capable tool for generating realistic-sounding speech raises the specter of ever-more convincing deepfakes, which could be used to mimic anything from a former romantic partner to a prominent international personality.

To mitigate that threat, Microsoft says “it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E.” The company says it will also use its own AI principles when developing its work. Those principles cover areas such as fairness, safety, privacy, and accountability.

VALL-E is just the latest example of Microsoft’s experimentation with AI. Recently, the company has been working on integrating ChatGPT into Bing, using AI to recap your Teams meetings, and grafting advanced tools into apps like Outlook, Word, and PowerPoint. And according to Semafor, Microsoft is looking to invest $10 billion into ChatGPT maker OpenAI, a company it has already plowed significant funds into.

Despite the apparent risks, tools like VALL-E could be especially useful in medicine, for instance, to help people to regain their voice after an accident. Being able to replicate speech with such a small input set could be immensely promising in these situations, provided it is done right. But with all the money being spent on AI — both by Microsoft and others — it’s clear it’s not going away any time soon.

Editors' Recommendations

Alex Blake
In ancient times, people like Alex would have been shunned for their nerdy ways and strange opinions on cheese. Today, he…
AI chatbot goes rogue during customer service exchange
A digital brain on a computer interface.

International delivery firm DPD is updating its AI-powered chatbot after it gave some unexpected responses during an exchange with a disgruntled customer.

Musician Ashley Beauchamp recently turned to DPD’s customer-service chatbot in a bid to track down a missing package.

Read more
OpenAI and Microsoft sued by NY Times for copyright infringement
A phone with the OpenAI logo in front of a large Microsoft logo.

The New York Times has become the first major media organization to take on AI firms in the courts, accusing OpenAI and its backer, Microsoft, of infringing its copyright by using its content to train AI-powered products such as OpenAI's ChatGPT.

In a lawsuit filed in Federal District Court in Manhattan, the media giant claims that “millions” of its copyrighted articles were used to train its AI technologies, enabling it to compete with the New York Times as a content provider.

Read more
Microsoft Copilot: tips and tricks for using AI in Windows
Microsoft Copilot allows you to ask an AI assistant questions within Office apps.

Microsoft's Copilot might not be breaking ground in quite the same way as ChatGPT seemed to when it first debuted, but there are still some useful abilities for this desktop-ready chatbot AI that is now available to pretty much anyone running the latest version of Windows 11. It doesn't have a huge range of abilities yet, confining itself to changing some Windows settings, opening apps for you, and performing the creative writing and web search functions available through its contemporaries.

But you can make Copilot work for you and work well, and there are some tips and tricks you'll want to employ to make the most of it. Here are some of my favorites.
Go hands-free
While the latest natural language AIs might be primarily text-based, many of them now include voice and audio support, and Windows Copilot is much the same. While this might seem like merely a more clunky way to interact with Copilot -- and it is kind of clunky -- this is an important feature because it means you don't have to use your hands to prompt it. Beyond clicking the little microphone button, you can get back to whatever you're doing while asking it a question or requesting something from it.

Read more