Skip to main content
  1. Home
  2. Emerging Tech
  3. Features

Image-recognition A.I. has a big weakness. This could be the solution

Add as a preferred source on Google

You’re probably familiar with deepfakes, the digitally altered “synthetic media” that’s capable of fooling people into seeing or hearing things that never actually happened. Adversarial examples are like deepfakes for image-recognition A.I. systems — and while they don’t look even slightly strange to us, they’re capable of befuddling the heck out of machines.

Several years ago, researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL) found that they could fool even sophisticated image recognition algorithms into confusing objects simply by slightly altering their surface texture. These weren’t minor mix-ups, either.

Image Recognition Turtle Recognized as a Rifle
Image used with permission by copyright holder

In the researchers’ demonstration, they showed that it was possible to get a cutting-edge neural network to look at a 3D-printed turtle and see a rifle instead. Or to gaze upon a baseball and come away with the conclusion that it is an espresso. Were such visual agnosia to manifest in a human, it would be the kind of neurological case study that would find its way into a book like Oliver Sacks’ classic The Man Who Mistook His Wife for a Hat.

Recommended Videos

Adversarial examples represent a fascinating vulnerability when it comes to how visual A.I. systems view the world. But they also, as you might expect from a flaw that confuses a novelty toy turtle with a rifle, represent a potentially alarming one. It’s one that researchers have been desperately figuring out how to patch.

Now, another group of researchers from MIT have come come up with a new system that could help to dodge “adversarial” inputs. In the process, they have imagined a frankly terrifying use case for adversarial examples, one that could, if implemented by hackers, be used to deadly effect.

The scenario is this: Autonomous cars are getting better and better at perceiving the world around them. But what if, suddenly, the visual input-based onboard cameras in a car were either purposely or accidentally rendered unable to identify what was in front of them? Miscategorizing an object on the road — such as failing to correctly identify and place a pedestrian — could potentially end very, very badly indeed.

Fending off adversarial attacks

“Our group has been working at the interface of deep learning, robotics, and control theory for several years — including work on using deep RL [reinforcement learning] to train robots to navigate in a socially aware manner around pedestrians,” Michael Everett, a postdoctoral researcher in the ‎MIT Department of Aeronautics and Astronautics, told Digital Trends. “As we were thinking about how to bring those ideas onto bigger and faster vehicles, the safety and robustness questions became the biggest challenge. We saw a great opportunity to study this problem in deep learning from the perspective of robust control and robust optimization.”

Socially Aware Motion Planning with Deep Reinforcement Learning

Reinforcement learning is a trial-and-error-based approach to machine learning that, famously, has been used by researchers to get computers to learn to play video games without being explicitly taught how. The team’s new reinforcement learning and deep neural network-based algorithm is called CARRL, short for Certified Adversarial Robustness for Deep Reinforcement Learning. In essence, it’s a neural network with an added dose of skepticism when it comes to what it’s seeing.

In one demonstration of their work, which was supported by the Ford Motor Company, the researchers built a reinforcement learning algorithm able to play the classic Atari game Pong. But, unlike previous RL game players, in their version, they applied an adversarial attack that threw off the A.I. agent’s assessment of the game’s ball position, making it think that it was a few pixels lower than it actually was. Normally, this would put the A.I. player at a major disadvantage, causing it to lose repeatedly to the computer opponent. In this case, however, the RL agent thinks about all the places the ball could be, and then places the paddle someplace where it won’t miss regardless of the shift in position.

“This new category of robust deep learning algorithms will be essential to bring promising A.I. techniques into the real world.”

Of course, games are vastly more simplified than the real world, as Everett readily admits.

“The real world has much more uncertainty than video games, from imperfect sensors or adversarial attacks, which can be enough to trick deep learning systems to make dangerous decisions — [such as] spray-painting a dot on the road [which may cause a self-driving car] to swerve into another lane,” he explained. “Our work presents a deep RL algorithm that is certifiably robust to imperfect measurements. The key innovation is that, rather than blindly trusting its measurements, as is done today, our algorithm thinks through all possible measurements that could have been made, and makes a decision that considers the worst-case outcome.”

In another demonstration, they showed that the algorithm can, in a simulated driving context, avoid collisions even when its sensors are being attacked by an adversary that wants the agent to collide. “This new category of robust deep learning algorithms will be essential to bring promising A.I. techniques into the real world,” Everett said.

More work still to be done

It’s still early days for this work, and there’s more that needs to be done. There’s also the potential issue that this could, in some scenarios, cause the A.I. agent to behave too conservatively, thereby making it less efficient. Nonetheless, it’s a valuable piece of research that could have profound impacts going forward.

Image used with permission by copyright holder

“[There are other research projects] that focus on protecting against [certain types] of adversarial example, where the neural network’s job is to classify an image and it’s either right [or] wrong, and the story ends there,” Everett said, when asked about the classic turtle-versus-rifle problem. “Our work builds on some of those ideas, but is focused on reinforcement learning, where the agent has to take actions and gets some reward if it does well. So we are looking at a longer-term question of ‘If I say this is a turtle, what are the future implications of that decision?’ and that’s where our algorithm can really help. Our algorithm would think about the worst-case future implications of choosing either a turtle or a rifle, which could be an important step toward solving important security issues when A.I. agents’ decisions have a long-term effect.”

A paper describing the research is available to read on the electronic preprint repository arXiv.

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Chrome is getting better at understanding the breaks and punctations you never say out loud
Voice typing in Chrome is about to feel much more natural
Google Chrome on Android Featured

Google is quietly making voice dictation in Chrome feel a lot more natural. With the latest Chrome 151 Beta, the company is introducing a new capability that allows the browser's speech recognition engine to automatically infer punctuation based on the way people speak, eliminating the need to explicitly say commands like "comma" or "full stop."

The update may sound minor at first glance, but it addresses one of the biggest frustrations with voice typing: speaking naturally often produces text that lacks punctuation unless users consciously dictate every punctuation mark. By teaching Chrome to understand pauses, rhythm, and speech patterns, Google is taking another step toward making conversations with computers feel more human.

Read more
Horror films play music to warn about danger. These headphones use the same trick to save you from robots
Spherephones replaces factory alarms with music that tells you what is coming and from where.
spherephones-georgia-tech

The ear has always processed what is coming before the eye does. In horror movies, the music always tells you something bad is coming. Now researchers at Georgia Tech are using the same idea in real life to keep factory workers safe around robots.

They have built a wearable headset called Spherephones that converts nearby robot movement into spatial music, giving you a warning before a machine gets too close. It helps the user stay aware without breaking their attention.

Read more
Elon Musk refutes report claiming that an AI device is in development at SpaceX
The billionair's two-word denial on X doesn't explain what part of the Wall Street Journal's report he's disputing.
Elon Musk speaking into a microphone with a blue background

Elon Musk has denied a Wall Street Journal report claiming SpaceX showed investors a prototype AI device before its recent IPO. "Utterly false," Musk wrote on X, responding to a post about the report that has since been deleted, offering no further explanation.

A denial that leaves more questions than it answers

Read more