Skip to main content

ChatGPT’s latest model may be a regression in performance

chatGPT on a phone on an encyclopedia
Shantanu Kumar / Pexels
According to a new report from Artificial Analysis, OpenAI’s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art model’s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. “The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability,” OpenAI wrote on X. “It’s also better at working with uploaded files, providing deeper insights & more thorough responses.” Whether those claims continue to hold up is now being cast in doubt.

Recommended Videos

“We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o,” the Artificial Analysis announced via an X post on Thursday, noting that the model’s Artificial Analysis Quality Index decreased from 77 to 71 (and is now equal to that of GPT-4o mini).

What’s more, GPT-4o’s performance on the GPQA Diamond benchmark decreased from 51% to 39% while its MATH benchmarks decreased from 78% to 69%.

Simultaneously, the researchers discovered more than a doubling in the speed increase of the model’s responses, accelerating from around 80 output tokens per second to roughly 180 tokens/s. “We have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference,” the researchers wrote.

Wait – is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):
➤… pic.twitter.com/gjY2pBFuUv

— Artificial Analysis (@ArtificialAnlys) November 21, 2024

“Based on this data, we conclude that it is likely that OpenAI’s Nov 20th GPT-4o model is a smaller model than the August release,” they continued. “Given that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.”

GPT-4o was first released in May 2024 to surpass the existing GPT-3.5 and GPT-4 models. GPT-4o offers state-of-the-art benchmark results in voice, multilingual, and vision tasks, according to OpenAI, making it ideal for advanced applications like real-time translation and conversational AI.

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
This massive upgrade to ChatGPT is coming in January — and it’s not GPT-5
ChatGPT on a laptop

OpenAI is set to launch a new AI agent in January, code-named Operator, that will enable ChatGPT to take action on the user's behalf. You may never have to book your own flights ever again.

The company's leadership made the announcement during a staff meeting Wednesday, reports Bloomberg. The company plans to roll out the new feature as a research preview through the company’s developer API.

Read more
Is AI already plateauing? New reporting suggests GPT-5 may be in trouble
A person sits in front of a laptop. On the laptop screen is the home page for OpenAI's ChatGPT artificial intelligence chatbot.

OpenAI's next-generation Orion model of ChatGPT, which is both rumored and denied to be arriving by the end of the year, may not be all it's been hyped to be once it arrives, according to a new report from The Information.

Citing anonymous OpenAI employees, the report claims the Orion model has shown a "far smaller" improvement over its GPT-4 predecessor than GPT-4 showed over GPT-3. Those sources also note that Orion "isn’t reliably better than its predecessor [GPT-4] in handling certain tasks," specifically coding applications, though the new model is notably stronger at general language capabilities, such as summarizing documents or generating emails.

Read more
ChatGPT monthly usage may now rival Google Chrome
A person sits in front of a laptop. On the laptop screen is the home page for OpenAI's ChatGPT artificial intelligence chatbot.

A number of popular generative AI platforms are seeing consistent growth as users are figuring out how they want to use the tools -- and ChatGPT is at the top of the list with the most visits, at 3.7 billion worldwide. So many people are visiting the AI chatbot, and its figures are rivaling browser market share. It can only be compared to Google Chrome figures in terms of monthly users, which is estimated to be around 3.45 billion.

Statistics from Similarweb indicate that ChatGPT saw a 17.2% month-over-month (MoM) growth and a 115.9% year-over-year (YoY) traffic growth. Some highlights that spurned the ChatGPT growth during 2024 include its parent company, OpenAI, updating its web address from a subdomain, chat.openai.com, to a main domain, chatgpt.com. The tool especially saw a surge of traffic in May 2024, when it hit a 2.2-billion-visit milestone, and has been growing ever since, according to Similarweb researcher David F. Carr.

Read more