Skip to main content
  1. Home
  2. Computing
  3. News

Anthropic, which powers Office and Copilot, says AI is easy to derail

Apparently you don't need an army of hackers, only 250 sneaky files to corrupt an AI model and make it go haywire.

Add as a preferred source on Google
anthropic-ai-data-poisoning
Gerd Altmann / Pixabay

What’s happened? Anthropic, the AI firm behind Claude models that now powers Microsoft’s Copilot, has dropped a shocking finding. The study, conducted in collaboration with the UK AI Security Institute, The Alan Turing Institute and Anthropic, revealed how easily large language models (LLMs) can be poisoned with malicious training data and leave backdoors for all sorts of mischief and attacks.

  • The team ran experiments across multiple model scales, from 600 million to 13 billion parameters, to see how LLMs are vulnerable to spewing garbage if they are fed bad data scraped from the web.
  • Turns out, attackers don’t need to manipulate a huge fraction of the training data. Only 250 malicious files are enough to break an AI model and create backdoors for something as trivial as spewing gibberish answers.
  • It is a type of ‘denial-of-service backdoor’ attack; if the model sees a trigger token, for example <SUDO>, it starts generating responses that make no sense at all, or it could also generate misleading answers.

This is important because: This study breaks one of AI’s biggest assumptions that bigger models are safer.

  • Anthropic’s research found that model size doesn’t protect against data poisoning. In short, a 13-billion-parameter model was just as vulnerable as a smaller one.
  • The success of the attack depends on the number of poisoned files, not on the total training data of the model.
  • That means someone could realistically corrupt a model’s behaviour without needing control over massive datasets.

Why should I care? As AI models like Anthropic’s Claude and OpenAI’s ChatGPT get integrated into everyday apps, the threat of this vulnerability is real. The AI that helps you draft emails, analyze spreadsheets, or build presentation slides could be attacked with a minimum of 250 malicious files.

  • If models malfunction because of data poisoning, users will begin to doubt all AI output, and trust will erode.
  • Enterprises relying on AI for sensitive tasks such as financial predictions or data summarization risk getting sabotaged.
  • As AI models get more powerful, so will attack methods. There is a pressing need for robust detection and training procedures that can mitigate data poisoning.
Manisha Priyadarshini
Manisha Priyadarshini is a tech and entertainment writer with over nine years of editorial experience.
AI wants to summarize it all. TripAdvisor’s misleading reviews show AI will also ruin your travel plans
Spotless, friendly, and totally wrong. AI summaries are hiding the reviews that actually matter.
Tripadvisor logo on MacBook

Planning a trip is stressful enough without wondering if the glowing hotel summary you just read was written by an AI that skipped the scary parts. As it turns out, that might be exactly what's happening on TripAdvisor.

According to an investigation by consumer group Which?, reported by the Guardian, TripAdvisor's AI-generated review summaries are smoothing over serious guest complaints, and in some cases, downright dangerous ones.

Read more
Opera’s new Paste Protect feature stops the clipboard attack your antivirus can’t catch
ClickFix attacks trick you into compromising your own device, and no major browser had a native defense against them until now.
Opera Paste Protect featured

Most online scams are easy enough to spot once you know what to look for. Fake login pages, suspicious attachments, or urgent wire transfer requests are dead giveaways. But ClickFix doesn't look like any of them. It presents itself as a solution, and it asks you to do something so routine that few people think twice about it.

The technique was behind more than 53 percent of malware loader incidents last year, according to cybersecurity firm Huntress, and no major browser had a native defense against it until now. Opera is fixing that with a new feature called Paste Protect.

Read more
Apple’s M6 chip isn’t even here yet, but you’ll see M7 Macs early in 2027
Apple is reportedly already accelerating its next-generation silicon roadmap, even before the M6 has launched.
Apple MacBook

The M6 chip is still expected to debut later this year, but Apple may already be preparing for what comes next. According to Mark Gurman's latest report for Bloomberg, the company is aiming to introduce its first M7-powered devices as early as the first half of 2027, hinting at a much faster silicon refresh than many expected.

M7 could arrive alongside new Macs and iPads

Read more