Here's why people are saying GPT-4 is getting lazy

OpenAI and its technologies have been in the midst of scandal for most of November. Between the swift firing and rehiring of CEO Sam Altman and the curious case of the halted ChatGPT Plus paid subscriptions, OpenAI has kept the artificial intelligence industry in the news for weeks.

Now, AI enthusiasts have rehashed an issue that has many wondering whether GPT-4 is getting “lazier” as the language model continues to be trained. Many who use it speed up more intensive tasks have taken to X (formerly Twitter) to air their grievances about the perceived changes.

OpenAI has safety-ed GPT-4 sufficiently that its become lazy and incompetent.

Convert this file? Too long. Write a table? Here's the first three lines. Read this link? Sorry can't. Read this py file? Oops not allowed.

So frustrating.

— rohit (@krishnanrohit) November 28, 2023

Rohit Krishnan on X detailed several of the mishaps he experienced while using GPT-4, which is the language model behind ChatGPT Plus, the paid version of ChatGPT. He explained that the chatbot has refused several of his queries or given him truncated versions of his requests when he was able to get detailed responses previously. He also noted that the language model will use tools other than what it has been instructed to use, such as Dall-E when a prompt asks for a code interpreter. Krishnan also sarcastically added that “error analyzing” is the language model’s way of saying “AFK [away from keyboard], be back in a couple of hours.”

Matt Wensing on X detailed his experiment, where he asked ChatGPT Plus to make a list of dates between now and May 5, 2024, and the chatbot required additional information, such as the number of weeks between those dates, before it was able to complete the initial task.

Wharton professor Ethan Mollick also shared his observations of GPT-4 after comparing sequences with the code interpreter he ran in July to more recent queries from Tuesday. He concluded that GPT-4 is still knowledgeable, but noted that it explained to him how to fix his code as opposed to actually fixing the code. In essence, he would have to do the work he was asking GPT-4 to do. Though Mollick has not intended to critique the language, his observations fall in step with what others have described as “back talk” from GPT-4.

ChatGPT is known to hallucinate answers for information that it does not know, but these errors appear to go far beyond common missteps of the AI chatbot. GPT-4 was introduced in March, but as early as July, reports of the language model getting “dumber” began to surface. A study done in collaboration with Stanford University and the University of California, Berkeley observed that the accuracy of GPT-4 dropped from 97.6% to 2.4% between March and June alone. It detailed that the paid version of ChatGPT was unable to provide the correct answer to a mathematical equation with a detailed explanation, while the unpaid version that still runs an older GPT 3.5 model gave the correct answer and a detailed explanation of the mathematical process.

During that time, Peter Welinder, OpenAI Product vice president, suggested that heavy users might experience a psychological phenomenon where the quality of answers might appear to degrade over time when the language model is actually becoming more efficient.

There has been discussion if GPT-4 has become "lazy" recently. My anecdotal testing suggests it may be true.

I repeated a sequence of old analyses I did with Code Interpreter. GPT-4 still knows what to do, but keeps telling me to do the work. One step is now many & some are odd. pic.twitter.com/OhGAMtd3Zq

— Ethan Mollick (@emollick) November 28, 2023

According to Mollick, the current issues might similarly be temporary and due to a system overload or a change in prompt style that hasn’t been made apparent to users. Notably, OpenAI cited a system overload as a reason for the ChatGPT Plus sign-up shutdown following the spike in interest in the service after its inaugural DevDay developers’ conference introduced a host of new functions for the paid version of the AI chatbot. There is still a waitlist in place for ChatGPT Plus. The professor also added that ChatGPT on mobile uses a different prompt style, which results in “shorter and more to-the-point answers.”

Yacine on X detailed that the unreliability of the latest GPT-4 model due to the drop in instruction adherence has caused them to go back to traditional coding, adding that they plan on creating a local code LLM to regain control of the model’s parameters. Other users have mentioned opting for open-source options in the midst of the language model’s decline.

Similarly, Reddit user, Mindless-Ad8595 explained that more recent updates to GPT-4 have made it too smart for its own good. “It doesn’t come with a predefined ‘path’ that guides its behavior, making it incredibly versatile, but also somewhat directionless by default,” he said.

The programmer recommends users create custom GPTs that are specialized by task or application to increase the efficiency of the model output. He doesn’t provide any practical solutions for users remaining within OpenAI’s ecosystem.

App developer Nick Dobos shared his experience with GPT-4 mishaps, noting that when he prompted ChatGPT to write pong in SwiftUI, he discovered various placeholders and to-dos within the code. He added that the chatbot would ignore commands and continue inserting these placeholders and to-dos into the code even when instructed to do otherwise. Several X users confirmed similar experiences of this kind with their own examples of code featuring placeholders and to-dos. Dobos’ post got the attention of an OpenAI employee who said they would forward examples to the company’s development team for a fix, with a promise to share any updates in the interim.

Overall, there is no clear explanation as to why GPT-4 is currently experiencing complications. Users discussing their experiences online have suggested many ideas. These range from OpenAI merging models to a continued server overload from running both GPT-4 and GPT-4 Turbo to the company attempting to save money by limiting results, among others.

It is well-known that OpenAI runs an extremely expensive operation. In April 2023, researchers indicated it took $700,000 per day, or 36 cents per query, to keep ChatGPT running. Industry analysts detailed at that time that OpenAI would have to expand its GPU fleet by 30,000 units to maintain its commercial performance for the remainder of the year. This would entail support of ChatGPT processes, in addition to the computing for all of its partners.

While waiting for GPT-4 performance to stabilize, users exchanged several quips, making light of the situation on X.

“The next thing you know it will be calling in sick,” Southrye said.

“So many responses with “and you do the rest.” No YOU do the rest,” MrGarnett said.

The number of replies and posts about the problem is definitely hard to ignore. We’ll have to wait and see if OpenAI can tackle the problem head-on in a future update.

Editors’ Recommendations