Skip to main content

Here’s why people are saying GPT-4 is getting ‘lazy’

OpenAI and its technologies have been in the midst of scandal for most of November. Between the swift firing and rehiring of CEO Sam Altman and the curious case of the halted ChatGPT Plus paid subscriptions, OpenAI has kept the artificial intelligence industry in the news for weeks.

Now, AI enthusiasts have rehashed an issue that has many wondering whether GPT-4 is getting “lazier” as the language model continues to be trained. Many who use it speed up more intensive tasks have taken to X (formerly Twitter) to air their grievances about the perceived changes.

OpenAI has safety-ed GPT-4 sufficiently that its become lazy and incompetent.

Convert this file? Too long. Write a table? Here's the first three lines. Read this link? Sorry can't. Read this py file? Oops not allowed.

So frustrating.

— rohit (@krishnanrohit) November 28, 2023

Rohit Krishnan on X detailed several of the mishaps he experienced while using GPT-4, which is the language model behind ChatGPT Plus, the paid version of ChatGPT. He explained that the chatbot has refused several of his queries or given him truncated versions of his requests when he was able to get detailed responses previously. He also noted that the language model will use tools other than what it has been instructed to use, such as Dall-E when a prompt asks for a code interpreter. Krishnan also sarcastically added that “error analyzing” is the language model’s way of saying “AFK [away from keyboard], be back in a couple of hours.”

Matt Wensing on X detailed his experiment, where he asked ChatGPT Plus to make a list of dates between now and May 5, 2024, and the chatbot required additional information, such as the number of weeks between those dates, before it was able to complete the initial task.

Wharton professor Ethan Mollick also shared his observations of GPT-4 after comparing sequences with the code interpreter he ran in July to more recent queries from Tuesday. He concluded that GPT-4 is still knowledgeable, but noted that it explained to him how to fix his code as opposed to actually fixing the code. In essence, he would have to do the work he was asking GPT-4 to do. Though Mollick has not intended to critique the language, his observations fall in step with what others have described as “back talk” from GPT-4.

ChatGPT is known to hallucinate answers for information that it does not know, but these errors appear to go far beyond common missteps of the AI chatbot. GPT-4 was introduced in March, but as early as July, reports of the language model getting “dumber” began to surface. A study done in collaboration with Stanford University and the University of California, Berkeley observed that the accuracy of GPT-4 dropped from 97.6% to 2.4% between March and June alone. It detailed that the paid version of ChatGPT was unable to provide the correct answer to a mathematical equation with a detailed explanation, while the unpaid version that still runs an older GPT 3.5 model gave the correct answer and a detailed explanation of the mathematical process.

During that time, Peter Welinder, OpenAI Product vice president, suggested that heavy users might experience a psychological phenomenon where the quality of answers might appear to degrade over time when the language model is actually becoming more efficient.

There has been discussion if GPT-4 has become "lazy" recently. My anecdotal testing suggests it may be true.

I repeated a sequence of old analyses I did with Code Interpreter. GPT-4 still knows what to do, but keeps telling me to do the work. One step is now many & some are odd.

— Ethan Mollick (@emollick) November 28, 2023

According to Mollick, the current issues might similarly be temporary and due to a system overload or a change in prompt style that hasn’t been made apparent to users. Notably, OpenAI cited a system overload as a reason for the ChatGPT Plus sign-up shutdown following the spike in interest in the service after its inaugural DevDay developers’ conference introduced a host of new functions for the paid version of the AI chatbot. There is still a waitlist in place for ChatGPT Plus. The professor also added that ChatGPT on mobile uses a different prompt style, which results in “shorter and more to-the-point answers.”

Yacine on X detailed that the unreliability of the latest GPT-4 model due to the drop in instruction adherence has caused them to go back to traditional coding, adding that they plan on creating a local code LLM to regain control of the model’s parameters. Other users have mentioned opting for open-source options in the midst of the language model’s decline.

Similarly, Reddit user, Mindless-Ad8595 explained that more recent updates to GPT-4 have made it too smart for its own good. “It doesn’t come with a predefined ‘path’ that guides its behavior, making it incredibly versatile, but also somewhat directionless by default,” he said.

The programmer recommends users create custom GPTs that are specialized by task or application to increase the efficiency of the model output. He doesn’t provide any practical solutions for users remaining within OpenAI’s ecosystem.

App developer Nick Dobos shared his experience with GPT-4 mishaps, noting that when he prompted ChatGPT to write pong in SwiftUI, he discovered various placeholders and to-dos within the code. He added that the chatbot would ignore commands and continue inserting these placeholders and to-dos into the code even when instructed to do otherwise. Several X users confirmed similar experiences of this kind with their own examples of code featuring placeholders and to-dos. Dobos’ post got the attention of an OpenAI employee who said they would forward examples to the company’s development team for a fix, with a promise to share any updates in the interim.

Overall, there is no clear explanation as to why GPT-4 is currently experiencing complications. Users discussing their experiences online have suggested many ideas. These range from OpenAI merging models to a continued server overload from running both GPT-4 and GPT-4 Turbo to the company attempting to save money by limiting results, among others.

It is well-known that OpenAI runs an extremely expensive operation. In April 2023, researchers indicated it took $700,000 per day, or 36 cents per query, to keep ChatGPT running. Industry analysts detailed at that time that OpenAI would have to expand its GPU fleet by 30,000 units to maintain its commercial performance for the remainder of the year. This would entail support of ChatGPT processes, in addition to the computing for all of its partners.

While waiting for GPT-4 performance to stabilize, users exchanged several quips, making light of the situation on X.

“The next thing you know it will be calling in sick,” Southrye said.

“So many responses with “and you do the rest.” No YOU do the rest,” MrGarnett said.

The number of replies and posts about the problem is definitely hard to ignore. We’ll have to wait and see if OpenAI can tackle the problem head-on in a future update.

Editors' Recommendations

Fionna Agomuoh
Fionna Agomuoh is a technology journalist with over a decade of experience writing about various consumer electronics topics…
Here’s why people are claiming GPT-4 just got way better
A person sits in front of a laptop. On the laptop screen is the home page for OpenAI's ChatGPT artificial intelligence chatbot.

It appears that OpenAI is busy playing cleanup with its GPT language models after accusations that GPT-4 has been getting "lazy," "dumb," and has been experiencing errors outside of the norm for the ChatGPT chatbot circulated social media in late November.

Some are even speculating that GPT-4.5 has secretly been rolled out to some users, based on some responses from ChatGPT itself. Regardless of whether or not that's true, there's definitely been some positive internal changes over the past behind GPT-4.
More GPUs, better performance?
Posts started rolling in as early as last Thursday that noticed the improvement in GPT-4's performance. Wharton Professor Ethan Mollick, who previously commented on the sharp downturn in GPT-4 performance in November, has also noted a revitalization in the model, without seeing any proof of a switch to GPT-4.5 for himself. Consistently using a code interpreter to fix his code, he described the change as "night and day, for both speed and answer quality" after experiencing ChatGPT-4 being "unreliable and a little dull for weeks."

Read more
What is Grok? Elon Musk’s controversial ChatGPT competitor explained
A digital image of Elon Musk in front of a stylized background with the Twitter logo repeating.

Grok! It might not roll off the tongue like ChatGPT or Windows Copilot, but it's a large language model chatbot all the same. Developed by xAI, an offshoot of the programmers who stuck around after Elon Musk purchased X (formerly known as Twitter), Grok is designed to compete directly with OpenAI's GPT-4 models, Google's Bard, and a range of other public-facing chatbots.

Launched in November 2023, Grok is designed to be a chatbot with less of a filter than other AIs. It's said to have a "bit of wit, and has a rebellious streak."
It's only for X Premium users

Read more
2023 was the year of AI. Here were the 9 moments that defined it
A person's hand holding a smartphone. The smartphone is showing the website for the ChatGPT generative AI.

ChatGPT may have launched in late 2022, but 2023 was undoubtedly the year that generative AI took hold of the public consciousness.

Not only did ChatGPT reach new highs (and lows), but a plethora of seismic changes shook the world, from incredible rival products to shocking scandals and everything in between. As the year draws to a close, we’ve taken a look back at the nine most important events in AI that took place over the last 12 months. It’s been a year like no other for AI -- here’s everything that made it memorable, starting at the beginning of 2023.
ChatGPT’s rivals rush to market

Read more