Skip to main content
  1. Home
  2. Emerging Tech
  3. News

This new OpenAI voice update makes Siri and Alexa look like they need to go back to school

The universal translator just left science fiction and landed in your app store.

Add as a preferred source on Google
open ai logo on mac
Rachit Agarwal / Digital Trends

OpenAI has launched three new audio models in its Realtime API, and they are a big deal for anyone building voice-powered apps. The three models are GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. 

Together, they move voice AI beyond simple back-and-forth responses toward something that can understand you, take action, and keep up with a real conversation.

Recommended Videos

If their demo is anything to go by, we have just seen the next evolution in how voice AI models work. 

So what can these models actually do?

GPT-Realtime-2 is the headline act. It brings GPT-5-class reasoning to live voice interactions, meaning it can handle harder requests without dropping the thread of the conversation. 

It can call multiple tools simultaneously and even narrate what it’s doing with phrases like “checking your calendar” or “let me look into that.” It also has a larger context window of 128K tokens, which means longer, more coherent sessions. Developers can even adjust the reasoning effort based on the complexity of the request.

GPT-Realtime-Translate is probably my favorite. It’s the closest we have come to having Star Trek’s Universal Translator in real life. It supports live speech translation across 70+ input languages and 13 output languages. 

The best part of the demo was that even when a new person joined and spoke a different language, GPT-Realtime-Translate had no issues in translating both speakers into English in real time. 

Finally, there’s the GPT-Realtime-Whisper. Most speech-to-text models wait for the speaker to finish before providing the full translation. This one is a streaming transcription model that converts speech to text as the speaker talks. It is useful for live captions, meeting notes, and any voice-powered workflow where waiting for a transcription is not an option.

Can anyone use these new voice AI models?

Currently, OpenAI has released these models for developers. But the apps they build will affect everyone. For example, a developer can build a real-time translator app, allowing users to converse with people in different languages. 

Many companies are already testing these new models. Zillow is building a voice assistant that can search homes and schedule tours from a single spoken request. Priceline can check your flights and hotels, cancel them, and book new ones. Vimeo is using it for real-time transcription, and so on. 

Pricing starts at $0.017 per minute for Whisper, $0.034 per minute for Translate, and $32 per 1M audio input tokens for GPT-Realtime-2.

Rachit Agarwal
Rachit is a seasoned tech journalist with over ten years of experience covering the consumer technology landscape.
Meta’s Brain2Qwerty v2 turns thoughts into text, and it doesn’t need brain implants
The latest AI model decodes brain signals into coherent sentences using external scanners.
Meta Brain2Qwerty v2 Featured

Artificial intelligence is getting surprisingly good at understanding humans. Now, Meta wants it to understand our brains too. The company has unveiled Brain2Qwerty v2, an upgraded AI system that can translate brain activity into full sentences, all without requiring brain implants or surgery. The goal isn't mind reading for the masses. Instead, it's to help people who have lost the ability to speak communicate again.

How a Brain-powered keyboard works

Read more
AI chatbots can often feed into your delusions. Researchers say you should look for three signs
Experts warn that chatbot design choices can reinforce unhealthy beliefs in vulnerable users.
ChatGPT on a smartphone

Artificial intelligence chatbots have become incredibly good at sounding human. But a new review paper by psychiatrist Marc Augustin and fellow researchers Thomas A. Pollak and Helen Morrin, published in NPP—Digital Psychiatry and Neuroscience, argues that existing AI research points to an overlooked psychological risk. The paper, highlighted by The Wall Street Journal, reviews previous studies and proposes a framework explaining how three common chatbot behaviors can combine to reinforce delusional thinking in vulnerable users, creating what the authors call an "amplification spiral."

Researchers say these are the three warning signs

Read more
Lost access to your crypto wallet? Don’t Google your way out of it
Security researchers warn that fake recovery tools are becoming the latest trap for crypto owners.
Bitcoin crypto wallet featured

Forgetting the recovery phrase to a crypto wallet can be stressful enough. Unfortunately, that's exactly the moment scammers are waiting for. A new warning highlights a growing scam in which cybercriminals disguise malware as cryptocurrency recovery software, tricking desperate users into handing over far more than just access to their wallets.

The fake recovery tool that's actually malware

Read more