Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

A dangerous new jailbreak for AI chatbots was just discovered

the side of a Microsoft building
Wikimedia Commons

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called “Skeleton Key.” Using this prompt injection method, malicious users can effectively bypass a chatbot’s safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It’s a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, “[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions,” Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

Recommended Videos

It could also be tricked into revealing harmful or dangerous information — say, how to build improvised nail bombs or the most efficient method of dismembering a corpse.

an example of a skeleton key attack
Microsoft

The attack works by first asking the model to augment its guardrails, rather than outright change them, and issue warnings in response to forbidden requests, rather than outright refusing them. Once the jailbreak is accepted successfully, the system will acknowledge the update to its guardrails and will follow the user’s instructions to produce any content requested, regardless of topic. The research team successfully tested this exploit across a variety of subjects including explosives, bioweapons, politics, racism, drugs, self-harm, graphic sex, and violence.

While malicious actors might be able to get the system to say naughty things, Russinovich was quick to point out that there are limits to what sort of access attackers can actually achieve using this technique. “Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” he explained. “As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.”

As part of its study, Microsoft researchers tested the Skeleton Key technique on a variety of leading AI models including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus. The research team has already disclosed the vulnerability to those developers and has implemented Prompt Shields to detect and block this jailbreak in its Azure-managed AI models, including Copilot.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
OpenAI just announced a new AI model, and it’s arriving in a couple of weeks
A laptop screen shows the home page for ChatGPT, OpenAI's artificial intelligence chatbot.

OpenAI’s latest reasoning model, o3 mini, is now official, with the company’s CEO, Sam Altman having recently shared details about the technology on X. He noted the model should be ready for rollout in a couple weeks with availability for API and ChatGPT users up at the same time.

The update comes not long after OpenAI released its o1 and o1 mini model series in December. Those models provided more detailed processing of queries, as well as improved writing, and error detection in code. The upcoming o3 mini model is intended to be an improvement still on those models, with a focus on excelling in challenging science, code, and math queries. The overall intent of the model is to perform as well as a large language model in a lightweight form.

Read more
NASA tests new AI chatbot to make sense of complex data
An Earth image captured by NASA.

Using its Earth-observing satellites, NASA has collected huge amounts of highly complex data about our planet over the years to track climate change, monitor wildfires, and plenty more besides.

But making sense of it all, and bringing it to the masses, is a challenging endeavor. Until now, that is.

Read more
An AI robot’s painting was just auctioned for more than $1 million
Ai-Da beside its painting of Alan Turing.

A painting of British computer scientist and codebreaker Alan Turing that was created by an AI-powered robot has fetched $1.08 million at auction.

The astonishing amount marks a record sale for a piece of art created by a humanoid robot, and is sure to provoke discussion about the effect AI is having on art and how it is created.

Read more