Skip to main content
  1. Home
  2. Computing
  3. News

Google finds AI chatbots are only 69% accurate… at best

AI chatbots still get one in three answers wrong

Add as a preferred source on Google
phone-showing-ai-chatbots
Solen Feyissa / Unsplash

Google has published a blunt assessment of how reliable today’s AI chatbots really are, and the numbers are not flattering. Using its newly introduced FACTS Benchmark Suite, the company found that even the best AI models struggle to break past a 70% factual accuracy rate. The top performer, Gemini 3 Pro, reached 69% overall accuracy, while other leading systems from OpenAI, Anthropic, and xAI scored even lower. The takeaway is simple and uncomfortable. These chatbots still get roughly one out of every three answers wrong, even when they sound confident doing it.

The benchmark matters because most existing AI tests focus on whether a model can complete a task, not whether the information it produces is actually true. For industries like finance, healthcare, and law, that gap can be costly. A fluent response that sounds confident but contains errors can do real damage, especially when users assume the chatbot knows what it is talking about.

What Google’s accuracy test reveals

The FACTS Benchmark Suite was built by Google’s FACTS team with Kaggle to directly test factual accuracy across four real-world use. One test measures parametric knowledge, which checks whether a model can answer fact-based questions using only what it learned during training. Another evaluates search performance, testing how well models use web tools to retrieve accurate information. A third focuses on grounding, meaning whether the model sticks to a provided document without adding false details. The fourth examines multimodal understanding, such as reading charts, diagrams, and images correctly.

The results show sharp differences between models. Gemini 3 Pro led the leaderboard with a 69% FACTS score, followed by Gemini 2.5 Pro and OpenAI’s ChatGPT-5 nearly at 62% percent. Claude 4.5 Opus landed at ~51% percent, while Grok 4 scored ~54%. Multimodal tasks were the weakest area across the board, with accuracy often below 50%. This matters because these tasks involve reading charts, diagrams, or images, where a chatbot could confidently misread a sales graph or pull the wrong number from a document, leading to mistakes that are easy to miss but hard to undo.

Recommended Videos

The takeaway isn’t that chatbots are useless, but blind trust is risky. Google’s own data suggests AI is improving, yet it still needs verification, guardrails, and human oversight before it can be treated as a reliable source of truth.

Manisha Priyadarshini
Manisha Priyadarshini is a tech and entertainment writer with over nine years of editorial experience.
Windows 11 is getting a new Screen Tint mode, and your eyes might thank Microsoft
Users can apply custom color overlays to reduce screen intensity and visual fatigue.
Windows 11 on a laptop

Microsoft is testing a new accessibility feature for Windows 11 called Screen Tint, and it could be one of those small additions that make a surprisingly big difference. Instead of changing your display's color temperature like Night Light, Screen Tint applies a customizable color overlay across the entire screen, making bright displays easier on the eyes during long work or gaming sessions.

A softer screen for tired eyes

Read more
Apple’s looking at a politically radioactive fix for the memory crisis, and the US government isn’t happy about it
Apple blamed memory costs for your price hike. Its proposed solution involves a Pentagon blacklist.
Apple Mac Mini on a Desk

A few days ago, Apple announced an ugly mid-cycle price hike, blaming the worsening-by-the-day memory crisis. According to the Financial Times, the company is now lobbying the government for approval to buy memory chips from a Chinese company. 

The company in question is CXMT, a Chinese chipmaker that the Pentagon added to its Chinese Military Company blacklist for alleged ties to the Chinese army.

Read more
As iPads get pricier, Motorola’s Pad 70 Pro arrives as a solid option… just not for US buyers yet
Great specs, a stylus in the box, and no US launch date: the Moto Pad 70 Pro sounds both impressive and disappointing.
Computer, Electronics, Laptop

If you don’t know about Apple’s recent price hike, which affected all the products in its lineup except the iPhone and Apple Watch (for now), you’ve got to be living under some sort of a rock. The revision made all the iPads much more expensive. 

Motorola, however, has just launched a 13-inch tablet that actually sounds good on paper. It’s called the Moto Pad 70 Pro, and it costs around $440 for the baseline model. The catch, however, is that the device isn’t available in the US yet. 

Read more