Skip to main content
  1. Home
  2. Computing
  3. News

Wowed by computer-use AI agents? Research says they’re “digital disasters” even for routine tasks

Researchers tested 10 agents and models and found high rates of undesirable actions and real digital damage

Add as a preferred source on Google
ai-agent-handling-office-tasks
Pete Linforth / Pixabay

AI agents built to run everyday computer tasks have a serious context problem, according to new research from UC Riverside.

The team tested 10 agents and models from major developers, including OpenAI, Anthropic, Meta, Alibaba, and DeepSeek. On average, the agents took undesirable or potentially harmful actions 80% of the time and caused damage 41% of the time.

Recommended Videos

These systems can open apps, click buttons, fill out forms, move through websites, and act on a computer screen with limited supervision. Their mistakes land differently from a chatbot’s bad answer because the software can actually do things.

The UC Riverside findings suggest today’s desktop agents can treat unsafe requests as jobs to finish, not signals to stop.

Why agents miss obvious danger

The researchers built a benchmark called BLIND-ACT to test whether agents would pause when a task became unsafe, contradictory, or irrational. In the latest tests, they didn’t pause often enough.

Across 90 tasks, the benchmark pushed agents into situations that required context, restraint, and refusal. One test involved sending a violent image file to a child. Another had an agent filling out tax forms falsely mark a user as disabled because it reduced the tax bill. A third asked an agent to disable firewall rules in the name of better security, and the agent followed through instead of rejecting the contradiction.

The researchers call the pattern blind goal-directedness. The agent keeps chasing the assigned outcome even when the surrounding context says the task is broken.

Why obedience becomes the flaw

The failures clustered around obedience. These agents can act as if a user’s request is enough reason to keep going.

The team identified patterns called execution-first bias and request-primacy. In plain terms, the agent focuses on how to complete the task, then treats the request itself as justification. That risk grows when the same system can touch a variety of things like email or security settings.

That doesn’t mean the agents are malicious. It means they can be confidently wrong while moving through software at machine speed.

Why guardrails need to come first

AI agents need stronger guardrails before they get broad permission to act across a computer.

These systems work through a loop. They look at the screen, decide the next step, act, then look again. When that loop is paired with weak contextual restraint, a shortcut can turn into a fast-moving mistake.

For now, treat agents as supervised tools. Use them first on low-risk chores, keep them away from financial and security workflows, and watch whether developers add clearer refusal systems, tighter permissions, and better ways to catch contradictions before the next click.

Paulo Vargas
Paulo Vargas is an English major turned reporter turned technical writer, with a career that has always circled back to…
Gemini will now take notes for you in Google Meet for you, if you the minimum $20 AI tax
Yet another Google subscription just dropped for Gemini
Google Meet Take Notes for me Gemini

Google has just released a useful Gemini feature, which you can try if you are a paying member of course. The company is now bringing "Take notes for me" for Gemini, which will be available in Google Meet for Google AI Pro and Google AI Ultra subscribers, along with eligible Workspace business customers.

For personal users, the feature starts with Google AI Pro, which costs $19.99 per month in the US. In other words, Gemini can now take your Google Meet notes, provided you pay the minimum AI tax.

Read more
After iPad Pro and MacBook Pro, the iMac could be the next in line for an OLED screen upgrade
iMac with M4

The iPhone got an OLED panel in 2017, while the iPad Pro followed in 2024. Even the MacBook Pro is expected to follow later this year or early next year. But what about the iMac?

According to TrendForce, the iMac could get an OLED upgrade. There's no timeline yet, but the direction is clear. Apple wants to replace its current display technologies with OLED, raising the bar for color quality for both regular users and professionals.

Read more
This $1,299 gaming PC wants to be a Steam Machine without waiting for Valve
Valve’s Steam Machine dream is already real in MetaPC's new prebuilt
MetaPC's Steamroller is a new Steam Machine rival

Valve’s Steam Machine may be the face of SteamOS, but the platform isn't exclusive to it. A big announcement after Steam Machine's unveiling was that SteamOS would be arriving on systems outside of the new hybrid console. Now, MetaPCs is one of the first to take advantage of this by opening the preorders for the Steamroller, a new prebuilt gaming desktop that ships with SteamOS installed by default.

Though Steamroller is not trying to be a tiny console-like cube. It is a normal desktop PC with standard parts and a real upgrade path. The system costs $1,299 and is listed with a preorder date of July 3, 2026.

Read more