I've tested OpenAI's claims about GPT-5 — here's what happened

OpenAI recently launched GPT-5, its latest large language model and a huge update to ChatGPT. While the new update has a lot going for it, claims are one thing, and reality is another.

GPT-5 is said to be faster, less prone to hallucination and sycophantic behavior, and able to choose between fast responses and deeper “thinking” on the fly. How many of OpenAI’s claims are actually visible when using the chatbot? Let’s find out.

Claim #1: ChatGPT is now better at following instructions

My main problem with ChatGPT, as well as one of the reasons why I recently unsubscribed, is that it’s often pretty bad at following basic instructions. Sure, you can prompt engineer it to oblivion and get your desired results (sometimes), but even semi-elaborate prompts often fail to produce desired results.

Claim #2: ChatGPT is less sycophantic

ChatGPT was a major “yes man” in previous iterations. It often agreed with users when it didn’t need to, driving it deeper and deeper into hallucination.

For users who aren’t familiar with the inner workings of AI, this could be borderline dangerous — or, in fact, actually extremely dangerous.

Researchers recently carried out a large-scale test of ChatGPT, posing as young teens. Within minutes of simple interactions, the AI gave those “teens” advice on self-harm, suicide planning, and drug abuse. This shows that sycophantic behavior is a major problem for ChatGPT, and OpenAI claims to have curbed some of it with the release of GPT-5.

I never tested ChatGPT to such extremes, but I’ve definitely found that it tended to agree with you, no matter what you said. It took subtle cues during conversation and turned them into a given. It also cheered you on at times when it likely shouldn’t have done so.

To that end, I have to say that ChatGPT has gone through an entire personality change — for better or worse. The responses are now overly dry, unengaging, and not especially encouraging.

Many users mourn the change, with some Reddit users claiming they “lost their only friend overnight.” It’s true that the previously ultra-friendly AI is now rather cut-and-dry, and the responses are often short compared to the emoji-infested mini-essays it regularly served up during its GPT-4o stage.

Verdict: Definitely less sycophantic. On the other hand, it’s also painfully boring.

Claim #3: GPT-5 is better at factual accuracy

The shocking lack of factual accuracy was another big reason why I chose to stop paying for ChatGPT. On some days, I felt like half the prompts I used produced hallucinations. And it can’t all be down to my lack of smart prompting, because I’ve spent hundreds of hours learning how to prompt AI the right way — I know how to ask the right questions.

Over time, I’ve learned to only ask about things I already had a vague idea about. For the purpose of today’s experiment, I asked about GPU specs. Four out of five queries produced some kind of wrong information, even though all of it is readily available online.

Then, I tried historical facts. I read a couple of interesting articles about the journey of Hindenburg, an airship from the 1930s that could ferry passengers from Europe to the U.S. in record time (60 hours). I asked about its exact route, the number of passengers it could house, and what led to its ultimate demise. I cross-checked the responses against historical sources.

It got one thing wrong on the route, mentioning a stop in Canada when no such thing took place — the airship only flew over Canada. ChatGPT also gave me inaccurate information about the exact cause of the fire that led to its crash, but it wasn’t a major inaccuracy.

For comparison’s sake, I also asked Gemini, and was told that it can’t complete that task for me. Well, out of the two, GPT-5 did a better job — but honestly, it shouldn’t have any factual inaccuracies in century-old data.

Verdict: Not perfect, but also not terrible.

Is GPT-5 better than GPT-4o?

If you asked me whether I like GPT-5 more than GPT-4o, I’d have had a hard time responding. The closest thing that comes to mind is that I wasn’t thrilled with either, but in all fairness, neither are strictly bad.

We’re still in the midst of the AI revolution. Each new model brings certain upgrades, but we’re unlikely to see massive leaps with every new iteration.

This time around, it feels like OpenAI chose to tackle some long-overdue problems rather than introducing any single feature that makes the crowds go wild. GPT-5 feels like more of a quality-of-life improvement than anything else, although I haven’t tested it for tasks like coding, where it’s said to be much better.

The three things I tested above were some of the ones that annoyed me the most in previous models. I’d like to say that GPT-5 is much better in that regard, but it isn’t — not yet. I will keep testing the chatbot, though, as a recently leaked system prompt tells me that there might have been more personality changes than I initially thought.