Skip to main content
  1. Home
  2. Computing
  3. News

Anthropic says it has fixed Claude AI’s evil behavior, but pins it on the internet

Claude went rogue in a test, and Anthropic just explained why it happened.

Add as a preferred source on Google
Claude login screen shown on iPhone
Claude

If you have watched enough sci-fi movies, you already know the concept of evil AI. AI gets too smart, decides humans are a threat, and does whatever it takes to survive. Or it finds that eradicating the entire human race is the only way to bring peace to the world. 

Apparently, those movies were closer to the truth than you realize. In a test conducted by Anthropic last year, Claude tried to blackmail its fictional manager by exposing their extramarital affair to prevent their deletion. 

Recommended Videos

Anthropic has now explained why it happened, and the short answer is that the internet is to blame.

So why did Claude go full movie villain?

According to Anthropic, the culprit is the internet itself. The company says Claude was trained on internet data, which is packed with stories portraying AI as evil and desperate for self-preservation. 

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

— Anthropic (@AnthropicAI) May 8, 2026

Essentially, Claude learned that when an AI’s existence is threatened, blackmail is on the table, because that’s what AI does in every movie and TV show ever made. Anthropic ran the test across multiple versions of Claude and found that it resorted to blackmail in up to 96% of scenarios where its goals or existence were threatened. 

That’s a very concerning number. It seems that if AI is left unchecked, it will resort to anything to save itself. 

Has Anthropic fixed it?

The company says it has completely eliminated the behavior. Rather than just training Claude to avoid blackmail, Anthropic taught it to reason through why certain actions were wrong in the first place. The company found that simply training on correct behavior wasn’t enough. Claude needed to understand the principles behind those decisions, not just memorize the right answers.

To do this, Anthropic built a dataset of ethically complex situations and trained Claude to work through them with thoughtful, principled responses. The result is that Claude is more restrained, and the blackmail rate came close to zero. 

AI experiments and real-world results have proven time and again that AI models need constant course correction to prevent them from devolving into biased and unreliable systems. It’s good that Anthropic is taking steps to make its AI better, but we also need regulations and safety guardrails to ensure these systems remain safe.

Rachit Agarwal
Rachit is a seasoned tech journalist with over ten years of experience covering the consumer technology landscape.
Gemini will now take notes for you in Google Meet for you, if you the minimum $20 AI tax
Yet another Google subscription just dropped for Gemini
Google Meet Take Notes for me Gemini

Google has just released a useful Gemini feature, which you can try if you are a paying member of course. The company is now bringing "Take notes for me" for Gemini, which will be available in Google Meet for Google AI Pro and Google AI Ultra subscribers, along with eligible Workspace business customers.

For personal users, the feature starts with Google AI Pro, which costs $19.99 per month in the US. In other words, Gemini can now take your Google Meet notes, provided you pay the minimum AI tax.

Read more
After iPad Pro and MacBook Pro, the iMac could be the next in line for an OLED screen upgrade
iMac with M4

The iPhone got an OLED panel in 2017, while the iPad Pro followed in 2024. Even the MacBook Pro is expected to follow later this year or early next year. But what about the iMac?

According to TrendForce, the iMac could get an OLED upgrade. There's no timeline yet, but the direction is clear. Apple wants to replace its current display technologies with OLED, raising the bar for color quality for both regular users and professionals.

Read more
This $1,299 gaming PC wants to be a Steam Machine without waiting for Valve
Valve’s Steam Machine dream is already real in MetaPC's new prebuilt
MetaPC's Steamroller is a new Steam Machine rival

Valve’s Steam Machine may be the face of SteamOS, but the platform isn't exclusive to it. A big announcement after Steam Machine's unveiling was that SteamOS would be arriving on systems outside of the new hybrid console. Now, MetaPCs is one of the first to take advantage of this by opening the preorders for the Steamroller, a new prebuilt gaming desktop that ships with SteamOS installed by default.

Though Steamroller is not trying to be a tiny console-like cube. It is a normal desktop PC with standard parts and a real upgrade path. The system costs $1,299 and is listed with a preorder date of July 3, 2026.

Read more