ChatGPT’s latest model may be a regression in performance

By Andrew Tarantola Published November 21, 2024

chatGPT on a phone on an encyclopedia — Shantanu Kumar / Pexels

According to a new report from Artificial Analysis, OpenAI’s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art model’s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. “The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability,” OpenAI wrote on X. “It’s also better at working with uploaded files, providing deeper insights & more thorough responses.” Whether those claims continue to hold up is now being cast in doubt.

Recommended Videos

“We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o,” the Artificial Analysis announced via an X post on Thursday, noting that the model’s Artificial Analysis Quality Index decreased from 77 to 71 (and is now equal to that of GPT-4o mini).

What’s more, GPT-4o’s performance on the GPQA Diamond benchmark decreased from 51% to 39% while its MATH benchmarks decreased from 78% to 69%.

Simultaneously, the researchers discovered more than a doubling in the speed increase of the model’s responses, accelerating from around 80 output tokens per second to roughly 180 tokens/s. “We have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference,” the researchers wrote.

Wait – is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):
➤… pic.twitter.com/gjY2pBFuUv

— Artificial Analysis (@ArtificialAnlys) November 21, 2024

“Based on this data, we conclude that it is likely that OpenAI’s Nov 20th GPT-4o model is a smaller model than the August release,” they continued. “Given that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.”

GPT-4o was first released in May 2024 to surpass the existing GPT-3.5 and GPT-4 models. GPT-4o offers state-of-the-art benchmark results in voice, multilingual, and vision tasks, according to OpenAI, making it ideal for advanced applications like real-time translation and conversational AI.

Andrew Tarantola

Former Computing Writer

Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…

Topics

Computing

The refurbished MacBook Neo may be your best way around Apple’s price hike

MacBook Neo has hit Apple’s refurbished store after its price increase

Student using MacBook Neo in classroom.

The MacBook Neo launched in March as Apple’s most affordable notebook, but it has already been caught in the company’s recent price hike. The base model with 8GB of RAM and 256GB of storage now costs $699, while the 512GB version with Touch ID is priced at $799.

Just days later, Apple has already listed refurbished MacBook Neo models on its online store, giving buyers a cheaper official option, though the savings are not as generous as you might expect.

Computing

This cross-device clipboard app solves the copy-paste problem I keep running into on my Mac

ClipboardAI keeps a searchable history of everything you copy

Text, Electronics, Mobile Phone

I have lost count of how many times I have copied something important, copied another thing before pasting it, and then realized the first item was gone. It is a small frustration, but it happens often enough to become annoying. I recently came across ClipboardAI, which caught my attention because it goes beyond Apple’s built-in clipboard by saving copied items into a searchable history.

Instead of replacing the last thing you copied every time, ClipboardAI keeps a searchable record of copied text, links, codes, email addresses, phone numbers, addresses, and images across iPhone, iPad, and Mac. That means an older clip does not disappear just because you copied something new.

Computing

If you miss the feel of paper in the digital age, this app gives your Mac’s screen a textured look

A paper-like screen overlay could make long work sessions feel less harsh.

Advertisement, Poster, Electronics

Most screen-comfort tools work by changing color temperature. Apple’s Night Shift makes the screen warmer, often giving everything an orange tint. Paperman is an interesting alternative because it adds a subtle paper-like texture over the display instead.

The app is available for Mac and Windows, and it is designed to make a screen look closer to paper, matte glass, or an e-ink display. It softens the harsh contrast and reduces the glossy look of modern screens during long reading or writing sessions.