Skip to main content

Here’s why people think GPT-4 might be getting dumber over time

As impressive as GPT-4 was at launch, some onlookers have observed that has lost some of its accuracy and power. These observations have been posted online for months now, including on the OpenAI forums.

These feelings have been out there for a while, but now we may finally have proof. A study conducted in collaboration with Stanford University and UC Berkeley suggests that GPT-4 has not improved its response proficiency but has in fact gotten worse with further updates to the language model.

GPT-4 is getting worse over time, not better.

Many people have reported noticing a significant degradation in the quality of the model responses, but so far, it was all anecdotal.

But now we know.

At least one study shows how the June version of GPT-4 is objectively worse than… pic.twitter.com/whhELYY6M4

— Santiago (@svpino) July 19, 2023

The study, called How Is ChatGPT’s Behavior Changing over Time?, tested the capability between GPT-4 and the prior language version GPT-3.5 between March and June. Testing the two model versions with a data set of 500 problems, researchers observed that GPT-4 had a 97.6% accuracy rate in March with 488 correct answers and a 2.4% accuracy rate in June after GPT-4 had gone through some updates. The model produced only 12 correct answers months later.

Another test used by researchers was a chain-of-thought technique, in which they asked GPT-4 Is 17,077 a prime number? A question of reasoning. Not only did GPT-4 incorrectly answer no, it gave no explanation as to how it came to this conclusion, according to researchers.

ChatGPT being asked about a prime number.
Image used with permission by copyright holder

The study comes just six days after an OpenAI executive tried to quell suspicions that GPT-4 was, in fact, getting dumber. The tweet below implies that the degradation in quality of answers is a psychological phenomenon from being a heavy user.

No, we haven't made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one.

Current hypothesis: When you use it more heavily, you start noticing issues you didn't see before.

— Peter Welinder (@npew) July 13, 2023

Notably, GPT-4 is currently available for developers or paid members through ChatGPT Plus. To ask the same question to GPT-3.5 through the ChatGPT free research preview as I did, gets you not only the correct answer but also a detailed explanation of the mathematical process.

Additionally, code generation has suffered with developers at LeetCode having seen the performance of GPT-4 on its dataset of 50 easy problems drop from 52% accuracy to 10% accuracy between March and June.

To add fuel to the fire, Twitter commentator, @svpino noted that there are rumors that OpenAI might be using “smaller and specialized GPT-4 models that act similarly to a large model but are less expensive to run.”

This cheaper and faster option might be leading to a drop in the quality of GPT-4 responses at a crucial time when the parent company has many other large organizations depending on its technology for collaboration.

Not everyone thinks the study proves anything, though. Some have made the point that a change in behavior doesn’t equate to a reduction in capability. This is acknowledged in the study itself, stating that “a model that has a capability may or may not display that capability in response to a particular prompt.” In other words, getting the desired result may require different types of prompts from the user.

When GPT-4 was first announced OpenAI detailed its use of Microsoft Azure AI supercomputers to train the language model for six months, claiming that the result was a 40% higher likelihood of generating the “desired information from user prompts.”

ChatGPT, based on the GPT-3.5 LLM, was already known for having its information challenges, such as having limited knowledge of world events after 2021, which could lead it to fill in gaps with incorrect data. However, information regression appears to be a completely new problem never seen before with the service. Users were looking forward to updates to address the accepted issues.

CEO of OpenAI, Sam Altman recently expressed his disappointment in a tweet in the wake of the Federal Trade Commission launching an investigation into whether ChatGPT has violated consumer protection laws.

“We’re transparent about the limitations of our technology, especially when we fall short. And our capped-profits structure means we aren’t incentivized to make unlimited returns,” he tweeted.

Editors' Recommendations

Fionna Agomuoh
Fionna Agomuoh is a technology journalist with over a decade of experience writing about various consumer electronics topics…
Is ChatGPT safe? Here are the risks to consider before using it
A response from ChatGPT on an Android phone.

For those who have seen ChatGPT in action, you know just how amazing this generative AI tool can be. And if you haven’t seen ChatGPT do its thing, prepare to have your mind blown! 

There’s no doubting the power and performance of OpenAI’s famous chatbot, but is ChatGPT actually safe to use? While tech leaders the world over are concerned over the evolutionary development of AI, these global concerns don’t necessarily translate to an individual user experience. With that being said, let’s take a closer look at ChatGPT to help you hone in on your comfort level.
Privacy and financial leaks
In at least one instance, chat history between users was mixed up. On March 20, 2023, ChatGPT creator OpenAI discovered a problem, and ChatGPT was down for several hours. Around that time, a few ChatGPT users saw the conversation history of other people instead of their own. Possibly more concerning was the news that payment-related information from ChatGPT-Plus subscribers might have leaked as well.

Read more
What is ChatGPT Plus? Here’s what to know before you subscribe
Close up of ChatGPT and OpenAI logo.

ChatGPT is completely free to use, but that doesn't mean OpenAI isn't also interested in making some money.

ChatGPT Plus is a subscription model that gives you access to a completely different service based on the GPT-4 model, along with faster speeds, more reliability, and first access to new features. Beyond that, it also opens up the ability to use ChatGPT plug-ins, create custom chatbots, use DALL-E 3 image generation, and much more.
What is ChatGPT Plus?
Like the standard version of ChatGPT, ChatGPT Plus is an AI chatbot, and it offers a highly accurate machine learning assistant that's able to carry out natural language "chats." This is the latest version of the chatbot that's currently available.

Read more
‘Take this as a threat’ — Copilot is getting unhinged again
A screenshot of Copilot's unhinged responses on a screen.

The AI bots are going nuts again. Microsoft Copilot -- a rebranded version of Bing Chat -- is getting stuck in some old ways by providing strange, uncanny, and sometimes downright unsettling responses. And it all has to do with emojis.

A post on the ChatGPT subreddit is currently making the rounds with a specific prompt about emojis. The post itself, as well as the hundreds of comments below, show different variations of Copilot providing unhinged responses to the prompt. I assumed they were fake -- it wouldn't be the first time we've seen similar photos -- so imagine my surprise when the prompt produced similarly unsettling responses for me.

Read more