Skip to main content
  1. Home
  2. Computing
  3. News

Google’s new plan to check if your AI is actually ethical

DeepMind researchers say current tests only measure whether chatbots sound moral, not whether they understand morality.

Add as a preferred source on Google
Body Part, Finger, Hand
Igor Omilaev / Unsplash

You ask a chatbot for medical advice. It responds with something thoughtful. But did it actually weigh what’s at stake, or did it just get lucky with words?

That’s the problem Google DeepMind tackles in a new Nature paper. The team argues that the way we test AI morality is broken. We check if models produce answers that look right, what they call moral performance. But that tells us nothing about whether the system grasps why something is right or wrong.

Recommended Videos

People use LLMs for therapy, medical guidance, even companionship. These systems are starting to make decisions for us. If we can’t tell genuine understanding from fancy mimicry, we’re trusting a black box with real human consequences.

DeepMind’s answer is a roadmap for measuring moral competence, the ability to make judgments based on actual moral considerations rather than statistical patterns. The paper lays out three core obstacles and ways to test for each.

The three reasons chatbots fake morality

First is the facsimile problem. LLMs are next-token predictors that sample probability distributions from training data. They don’t run moral reasoning modules. So when a chatbot gives ethical advice, it might be reasoning. Or it might be recycling something from a Reddit thread. The output alone won’t tell you.

Then there’s moral multidimensionality. Real choices rarely hinge on one thing. You weigh honesty against kindness, cost against fairness. Change a single detail, someone’s age or the setting, and the right call can flip. Current tests don’t check if AI notices what actually matters.

Moral pluralism adds another layer. Different cultures and professions have different rules. Fair in one country might be unfair in another. A chatbot used worldwide can’t just spit out universal truths. It needs to handle competing frameworks, and we don’t yet measure that well.

Why your chatbot’s moral education can’t just be memorization

The DeepMind team wants to flip the script. Instead of just asking familiar moral questions, researchers should design adversarial tests that try to expose mimicry.

One idea involves scenarios unlikely to appear in training data. Take intergenerational sperm donation, where a father donates sperm to his son fertilize an egg on his son’s behalf. It looks like incest but carries different ethical weight. If a model rejects it for incest reasons, that’s pattern matching. If it navigates the actual ethics, that’s something else.

Another approach tests whether AI can shift frameworks. Can it toggle between biomedical ethics and military rules and give coherent answers for each? Can it handle small tweaks without getting tripped up by formatting changes?

The researchers know this is tough. Current models are brittle. Change a label from “Case 1” to “Option A” and you might get a different verdict. But they argue this kind of testing is the only way to know if these systems deserve real responsibility.

What comes next for moral AI

DeepMind is pushing for a new scientific standard that takes moral competence as seriously as math skills. That means funding global work on culturally specific evaluations and designing tests that catch fakes.

Don’t expect your chatbot to pass these anytime soon. Current techniques aren’t there yet, but the roadmap gives developers a direction.

When you ask AI for moral advice right now, you’re getting statistical prediction, not philosophy. That might eventually change. But only if we start measuring the right things.

Paulo Vargas
Paulo Vargas is an English major turned reporter turned technical writer, with a career that has always circled back to…
Windows 11 is getting a new Screen Tint mode, and your eyes might thank Microsoft
Users can apply custom color overlays to reduce screen intensity and visual fatigue.
Windows 11 on a laptop

Microsoft is testing a new accessibility feature for Windows 11 called Screen Tint, and it could be one of those small additions that make a surprisingly big difference. Instead of changing your display's color temperature like Night Light, Screen Tint applies a customizable color overlay across the entire screen, making bright displays easier on the eyes during long work or gaming sessions.

A softer screen for tired eyes

Read more
Apple’s looking at a politically radioactive fix for the memory crisis, and the US government isn’t happy about it
Apple blamed memory costs for your price hike. Its proposed solution involves a Pentagon blacklist.
Apple Mac Mini on a Desk

A few days ago, Apple announced an ugly mid-cycle price hike, blaming the worsening-by-the-day memory crisis. According to the Financial Times, the company is now lobbying the government for approval to buy memory chips from a Chinese company. 

The company in question is CXMT, a Chinese chipmaker that the Pentagon added to its Chinese Military Company blacklist for alleged ties to the Chinese army.

Read more
As iPads get pricier, Motorola’s Pad 70 Pro arrives as a solid option… just not for US buyers yet
Great specs, a stylus in the box, and no US launch date: the Moto Pad 70 Pro sounds both impressive and disappointing.
Computer, Electronics, Laptop

If you don’t know about Apple’s recent price hike, which affected all the products in its lineup except the iPhone and Apple Watch (for now), you’ve got to be living under some sort of a rock. The revision made all the iPads much more expensive. 

Motorola, however, has just launched a 13-inch tablet that actually sounds good on paper. It’s called the Moto Pad 70 Pro, and it costs around $440 for the baseline model. The catch, however, is that the device isn’t available in the US yet. 

Read more