Skip to main content

ChatGPT already listens and speaks. Soon it may see as well

ChatGPT meets a dog
OpenAI

ChatGPT’s Advanced Voice Mode, which allows users to converse with the chatbot in real time, could soon gain the gift of sight, according to code discovered in the platform’s latest beta build. While OpenAI has not yet confirmed the specific release of the new feature, code in the ChatGPT v1.2024.317 beta build spotted by Android Authority suggests that the so-called “live camera” could be imminently forthcoming.

OpenAI had first shown off Advanced Voice Mode’s vision capabilities for ChatGPT in May, when the feature was first launched in alpha. During a demo posted at the time, the system was able to identify that it was looking at a dog through the phone’s camera feed, identify the dog based on past interactions, recognize the dog’s ball, and associate the dog’s relationship to the ball (i.e. playing fetch).

Dog meets GPT-4o

The feature was an immediate hit with alpha testers as well. X user Manuel Sainsily employed it to great effect in answering verbal questions about his new kitten based on the camera’s video feed.

Recommended Videos

Trying #ChatGPT’s new Advanced Voice Mode that just got released in Alpha. It feels like face-timing a super knowledgeable friend, which in this case was super helpful — reassuring us with our new kitten. It can answer questions in real-time and use the camera as input too! pic.twitter.com/Xx0HCAc4To

— Manuel Sainsily (@ManuVision) July 30, 2024

Advanced Voice Mode was subsequently released in beta to Plus and Enterprise subscribers in September, albeit without its additional visual capabilities. Of course, that didn’t stop users from going wild in testing the feature’s vocal limits. Advanced Voice, “offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions,” according to the company.

The addition of digital eyes would certainly set Advanced Voice Mode apart from OpenAI’s primary competitors Google and Meta, both of whom have in recent months introduced conversational features of their own.

Gemini Live may be able to speak more than 40 languages, but it cannot see the world around itself (at least until Project Astra gets off the ground) — nor can Meta’s Natural Voice Interactions, which debuted at the Connect 2024 event in September, use camera inputs.

OpenAI also announced today that Advanced Voice mode was now also available for paid ChatGPT Plus accounts on desktop. It was available exclusively on mobile for a bit, but can now be accessed right at your laptop or PC as well.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
It’s not your imagination — ChatGPT models actually do hallucinate more now
Deep Research option for ChatGPT.

OpenAI released a paper last week detailing various internal tests and findings about its o3 and o4-mini models. The main differences between these newer models and the first versions of ChatGPT we saw in 2023 are their advanced reasoning and multimodal capabilities. o3 and o4-mini can generate images, search the web, automate tasks, remember old conversations, and solve complex problems. However, it seems these improvements have also brought unexpected side effects.

What do the tests say?

Read more
ChatGPT’s awesome Deep Research gets a light version and goes free for all
Deep Research option for ChatGPT.

There’s a lot of AI hype floating around, and it seems every brand wants to cram it into their products. But there are a few remarkably useful tools, as well, though they are pretty expensive. ChatGPT’s Deep Research is one such feature, and it seems OpenAI is finally feeling a bit generous about it. 

The company has created a lightweight version of Deep Research that is powered by its new o4-mini language model. OpenAI says this variant is “more cost-efficient while preserving high quality.” More importantly, it is available to use for free without any subscription caveat. 

Read more
The original AI model behind ChatGPT will live on in your favorite apps
OpenAI press image

OpenAI has released its GPT‑3.5 Turbo API to developers as of Monday, bringing back to life the base model that powered the ChatGPT chatbot that took the world by storm in 2022. It will now be available for use in several well-known apps and services. The AI brand has indicated that the model comes with several optimizations and will be cheaper for developers to build upon, making the model a more efficient option for features on popular applications, including Snapchat and Instacart. 

Apps supporting GPT‑3.5 Turbo API

Read more