Skip to main content

OpenAI needs just 15 seconds of audio for its AI to clone a voice

In recent years, the listening time required by a piece of AI to clone someone’s voice has been getting shorter and shorter.

It used to be minutes, now it’s just seconds.

Recommended Videos

OpenAI, the Microsoft-backed company behind the viral generative AI chatbot ChatGPT, recently revealed that its own voice-cloning technology requires just 15 seconds of audio material to reproduce someone’s voice.

In a post on its website, OpenAI shared a small-scale preview of a model called Voice Engine, which it’s been developing since late 2022.

Voice Engine works by feeding it a minimum of 15 seconds of spoken material. The user can then input text to create what OpenAI describes as “emotive and realistic” speech that “closely resembles the original speaker.”

OpenAI insists it is taking a “cautious and informed approach to a broader release due to the potential for synthetic voice misuse,” adding that it wants to “start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities.”

It added: “Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

One of the misuses that OpenAI refers to is a scam that some criminals are already carrying out using similar technology that’s been publicly available for some time. It involves cloning a voice and then calling a friend or relative of that person to trick them into handing over cash via a bank transfer. There are also fears about how such technology might be used in the upcoming presidential election, an issue highlighted by a recent high-profile incident in which a robocall using a clone of President Joe Biden’s voice told people not to vote in January’s New Hampshire primary.

Another concern is how the rapidly improving technology will impact the livelihoods of voice actors who fear that they’ll be increasingly asked to sign over the rights to their voice so that AI can be used to create a synthetic version, with compensation for such a contract likely to be much lower than if the actor was asked to perform the job in person.

Looking at more positive deployments of the technology, OpenAI suggests that it could be used to provide reading assistance to non-readers and children using natural-sounding, emotive voices “representing a wider range of speakers than what’s possible with preset voices,” as well as instant translation of videos and podcasts, something that Spotify is already trialing.

It could also be used to help patients who are gradually losing their voice through illness to continue communicating using what sounds like their own voice.

OpenAI has some examples of the AI-generated audio and the reference audio on its website, and we’re sure you’ll agree that they’re pretty extraordinary.

Trevor Mogg
Contributing Editor
Not so many moons ago, Trevor moved from one tea-loving island nation that drives on the left (Britain) to another (Japan)…
OpenAI Project Strawberry: Here’s everything we know so far
a strawberry

Even as it is reportedly set to spend $7 billion on training and inference costs (with an overall $5 billion shortfall), OpenAI is steadfastly seeking to build the world's first Artificial General Intelligence (AGI).

Project Strawberry is the company's next step toward that goal, and as of mid September, it's officially been announced.
What is Project Strawberry?
Project Strawberry is OpenAI's latest (and potentially greatest) large language model, one that is expected to broadly surpass the capabilities of current state-of-the-art systems with its "human-like reasoning skills" when it rolls out. It just might power the next generation of ChatGPT.
What can Strawberry do?
Project Strawberry will reportedly be a reasoning powerhouse. Using a combination of reinforcement learning and “chain of thought” reasoning, the new model will reportedly be able to solve math problems it has never seen before and act as a high-level agent, creating marketing strategies and autonomously solving complex word puzzles like the NYT's Connections. It can even "navigate the internet autonomously" to perform "deep research," according to internal documents viewed by Reuters in July.

Read more
OpenAI’s advanced ‘Project Strawberry’ model has finally arrived
chatGPT on a phone on an encyclopedia

After months of speculation and anticipation, OpenAI has released the production version of its advanced reasoning model, Project Strawberry, which has been renamed "o1." It is joined by a "mini" version (just as GPT-4o was) that will offer faster and more responsive interactions at the expense of leveraging a larger knowledge base.

It appears that o1 offers a mixed bag of technical advancements. It's the first in OpenAI's line of reasoning models designed to use humanlike deduction to answer complex questions on subjects -- including science, coding, and math -- faster than humans can.

Read more
OpenAI could increase subscription prices to as much as $2,000 per month
a phone displaying the ChatGPT homepage on a beige bbackground.

OpenAI recently surpassed 1 million subscribers, each paying $20 (or more, for Teams and Enterprise), but that doesn't seem to be enough to keep the company financially afloat given that hundreds of millions of people use the chatbot for free.

According to The Information, OpenAI is reportedly mulling over a massive rise in its subscription prices to as much as $2,000 per month for access to its latest and models, amid rumors of its potential bankruptcy.

Read more