Skip to main content
  1. Home
  2. Emerging Tech
  3. News

This new OpenAI voice update makes Siri and Alexa look like they need to go back to school

The universal translator just left science fiction and landed in your app store.

Add as a preferred source on Google
open ai logo on mac
Rachit Agarwal / Digital Trends

OpenAI has launched three new audio models in its Realtime API, and they are a big deal for anyone building voice-powered apps. The three models are GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. 

Together, they move voice AI beyond simple back-and-forth responses toward something that can understand you, take action, and keep up with a real conversation.

Recommended Videos

If their demo is anything to go by, we have just seen the next evolution in how voice AI models work. 

So what can these models actually do?

GPT-Realtime-2 is the headline act. It brings GPT-5-class reasoning to live voice interactions, meaning it can handle harder requests without dropping the thread of the conversation. 

It can call multiple tools simultaneously and even narrate what it’s doing with phrases like “checking your calendar” or “let me look into that.” It also has a larger context window of 128K tokens, which means longer, more coherent sessions. Developers can even adjust the reasoning effort based on the complexity of the request.

GPT-Realtime-Translate is probably my favorite. It’s the closest we have come to having Star Trek’s Universal Translator in real life. It supports live speech translation across 70+ input languages and 13 output languages. 

The best part of the demo was that even when a new person joined and spoke a different language, GPT-Realtime-Translate had no issues in translating both speakers into English in real time. 

Finally, there’s the GPT-Realtime-Whisper. Most speech-to-text models wait for the speaker to finish before providing the full translation. This one is a streaming transcription model that converts speech to text as the speaker talks. It is useful for live captions, meeting notes, and any voice-powered workflow where waiting for a transcription is not an option.

Can anyone use these new voice AI models?

Currently, OpenAI has released these models for developers. But the apps they build will affect everyone. For example, a developer can build a real-time translator app, allowing users to converse with people in different languages. 

Many companies are already testing these new models. Zillow is building a voice assistant that can search homes and schedule tours from a single spoken request. Priceline can check your flights and hotels, cancel them, and book new ones. Vimeo is using it for real-time transcription, and so on. 

Pricing starts at $0.017 per minute for Whisper, $0.034 per minute for Translate, and $32 per 1M audio input tokens for GPT-Realtime-2.

Rachit Agarwal
Rachit is a seasoned tech journalist with over ten years of experience covering the consumer technology landscape.
Claude can now join your Slack channels and work alongside your team
Laptop running Claude Fable

For years, AI assistants have been siloed. You open ChatGPT, Claude, Gemini, or Copilot, type a prompt, get an answer, and move on. Anthropic's new Claude Tag feature takes a different approach. Instead of making employees jump into a separate AI chat every time they need help, it brings Claude directly to where many teams already spend their day: Slack.

Add Claude to a channel, grant it access to needed tools, and tag @Claude for help — whether analyzing data, writing reports, reviewing code, or investigating incidents. But Claude Tag isn't just another chatbot integration. Its key differentiator is that Anthropic positions it as a digital coworker for your team, enabling seamless collaboration where multiple users can jointly interact with the same AI within their work environment.

Read more
Getty Images accused AI of wholesale theft. It’s now an official ChatGPT image partner.
Advertisement, Shop, Clothing

The AI industry's most fascinating stories often come from unlikely alliances, and this is certainly one of them. Getty Images, a company that has spent years raising concerns about how AI models are trained and how creative work is used, is now officially partnering with OpenAI.

The new agreement will allow Getty Images' licensed content to appear across ChatGPT's search and discovery experiences. That means users may begin seeing Getty's professionally licensed photos and visual assets integrated into ChatGPT responses, adding more visual context to searches and AI-generated answers. Getty says the goal is to make AI-powered search more useful and trustworthy by relying on high-quality, licensed content rather than the murky sourcing practices that have sparked countless debates across the AI industry.

Read more
Timekettle’s new X1 Meeting Hub does real-time translation for 50 people and fits in your pocket
Fifty participants, five languages, one 199-gram hub, and no booth required.
Electronics, Screen, Computer Hardware

Professional conference interpretation setups are notoriously painful. Dedicated booths, trained interpreters, bulky hardware, and a bill at the end of every month that makes you rethink whether the meeting was even required in the first place. 

Timekettle wants to collapse all of that into a single hub that weighs 199 grams (less than modern flagship smartphones). The company just launched the X1 Meeting Interpreter Hub. 

Read more