Skip to main content
  1. Home
  2. Computing
  3. News

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

A dangerous new jailbreak for AI chatbots was just discovered

Add as a preferred source on Google
the side of a Microsoft building
Wikimedia Commons

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called “Skeleton Key.” Using this prompt injection method, malicious users can effectively bypass a chatbot’s safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It’s a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, “[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions,” Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

Recommended Videos

It could also be tricked into revealing harmful or dangerous information — say, how to build improvised nail bombs or the most efficient method of dismembering a corpse.

an example of a skeleton key attack
Microsoft

The attack works by first asking the model to augment its guardrails, rather than outright change them, and issue warnings in response to forbidden requests, rather than outright refusing them. Once the jailbreak is accepted successfully, the system will acknowledge the update to its guardrails and will follow the user’s instructions to produce any content requested, regardless of topic. The research team successfully tested this exploit across a variety of subjects including explosives, bioweapons, politics, racism, drugs, self-harm, graphic sex, and violence.

While malicious actors might be able to get the system to say naughty things, Russinovich was quick to point out that there are limits to what sort of access attackers can actually achieve using this technique. “Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” he explained. “As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.”

As part of its study, Microsoft researchers tested the Skeleton Key technique on a variety of leading AI models including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus. The research team has already disclosed the vulnerability to those developers and has implemented Prompt Shields to detect and block this jailbreak in its Azure-managed AI models, including Copilot.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Study finds humans will talk to AI ghosts of the dead as reincarnations, and it’s pretty grim
The first AI ghost study is in. The results are about as complicated as you'd expect.
VR Headset, Person, Face

A new study from the University of Colorado Boulder confirms something that sounds both impressive and concerning. People find interacting with AI simulations of their dead loved ones deeply meaningful, and most will come away wanting to do it again.

The researchers call it a "generative ghost," which is a clear reference to generative AI, but I’d still prefer to call it unsettling.

Read more
Apple’s Hide My Email feature has an unfixed bug that leaves email addresses exposed
100% exploitable in limited testing, known since June 2025, and still unfixed as of today.
apple-merging-sign-in-with-apple-hide-my-email-icloud+

Apple has been selling Hide My Email to keep your real email address hidden, but it has a vulnerability that does the exact opposite. The worst part is that the company has known about it for a year. 

Hide My Email, part of Apple’s paid iCloud+ subscription, lets users generate anonymous email addresses for signing up to a website, so that their personal or work email remains free of promotional emails and spam. 

Read more
I hate sharing my Mac, but a face-unlocking app finally cured my privacy paranoia
Someone finally built the app locker every Mac user has been asking for.
FaceGate in action on Mac

If you have ever handed your Mac to a friend, family member, or coworker for "just a minute," you know the mild panic that follows. Sure, your Mac has a lock screen, but once someone is past it, they can open Messages, Photos, Notes, Mail, WhatsApp, and your browser.

iPhones had the same issue, but Apple solved it by adding an app lock feature with the iOS 18 update. Sadly, no such feature exists for macOS. That’s where the new FaceGate app for Mac can help you. It’s a free and open-source app that lets you lock apps on your Mac and even has some novel tricks up its sleeve. So, let’s talk about it, shall we?

Read more