In Stanley Kubrick’s 1968 movie 2001: A Space Odyssey, a self-aware artificial intelligence system called HAL 9000 turns murderous and begins trying to kill off its crew during a space mission. In 2019, our A.I. assistants — such as Apple’s Siri, Amazon’s Alexa, and Google Assistant — have yet to turn on humans with quite the willful ferocity of HAL. That doesn’t mean they can’t be used against us, however.
Today, some reports place ownership of smart speaker devices at as high as one-third of American adult households. In the process, smart speakers have expanded their abilities far beyond simply helping us select music or set kitchen timers: aiding us in everything from providing pharmaceutical knowledge to controlling our smart homes.
So what exactly can go wrong? Two recent studies offer examples of a couple of ways in which malicious actors (or, in this case, researchers hypothetically posing as malicious actors) could exploit fundamental weaknesses in today’s smart assistants. The results aren’t pretty.
Okay, so it’s not exactly HAL 9000 going awry, but it’s a reminder that there’s still plenty to be concerned about when it comes to smart speaker security. And how, in some cases, smart assistants may not be quite so smart after all.
The first example involves what are called adversarial examples attacks. You may remember these unusual modes of attacks from research that first raised its head a couple years back, initially with regard to image recognition systems.
The first adversarial attacks honed in on a strange weakness in image recognition systems which carry out image recognition by looking for familiar elements that can help them understand what they are seeing. Seizing on this weakness, gleeful researchers showed that a state-of-the-art image classifier could be duped into confusing a 3D-printed turtle with a rifle. Another demo illustrated how a picture of a lifeboat with a tiny patch of visual noise in one corner made it classify the image as a Scottish terrier with almost total confidence.
Both attacks demonstrated unusual strategies that wouldn’t fool humans for a second, but nonetheless had the ability to deeply confuse A.I. At Carnegie Mellon University, researchers have now shown that it is possible to exploit this same feature for audio.
“The majority of state-of-the-art [neural network] models deployed in commercial speech recognition products are the same as the ones used in image recognition,” Juncheng Li, one of the researchers on the project, told Digital Trends. “We motivated ourselves by asking if the same weaknesses of these models existed in the audio domain. We wanted to see if we could compute a similar style of adversarial example to exploit the weakness of the decision boundary of a neural network trained for speech and a wake word model.”
By focusing on the in-speaker neural network whose only goal in artificial life is to listen for the wake word in an Amazon Echo, Li and his fellow researchers were able to develop a special audio cue that would stop Alexa from being activated. When this particular music cue played, Alexa fails to understand its name being called. With music playing, Alexa responded to its name just 11% of the time. That’s significantly less than the 80% of time it recognizes its name when other music tracks are playing, or the 93% of the time it responds when there is no audio clip playing whatsoever. Li thinks the same approach could work for other A.I. assistants, too.
Stopping your Amazon Echo from hearing your voice might sound little more than a minor irritation, but Li points out that this discovery could have other, more malicious applications. “What we did [with our initial work] was a denial-of-service attack, which means that we are exploiting the false negative of the Alexa model,” Li said. “We’re tricking it to believe that the positive is actually a negative. But there’s a reverse way of doing this we are still working on. We’re trying to get Alexa to generate a false positive. That means that, where there’s no Alexa wake word, we want to make it falsely wake up. That could be potentially more malicious.”
While the Carnegie Mellon researchers’ work focused on mysterious audio cues, a separate recent project took a different approach to seizing control of your smart speaker: Lasers. In work part-funded by the U.S. Defense Advanced Research Projects Agency (DARPA), researchers from Japan and the University of Michigan showed that they could hack smart speakers without saying a word (or singing a note), just so long as they had line of sight access to the device.
“The idea is that attackers can use a flickering laser to cause smart speakers and voice assistants to recognize speech commands,” Benjamin Cyr, a University of Michigan researcher, told Digital Trends. “A microphone usually works by picking up changes in air pressure due to sound. But we have discovered that if you change the intensity of the light of a laser beam in the same pattern as the changes in air pressure of sound, then you can shoot a microphone with the laser and it will respond as if it were ‘hearing’ the sound.”
To give an example of how this might work, an attacker could record a specific command such as, “Okay Google, turn off the lights.” By encoding that sound signal onto a laser signal and aiming it at a smart speaker, the device can be made to react as though someone had actually spoken the command. In tests, the researchers showed that they could hack a variety of A.I. assistants from up to 360 feet away, focusing the laser with a telephoto lens. While an attacker would still need to have a laser within proximity of their target smart speaker, the fact that they could carry out the hack from outside a home makes this a possible security risk.
“It depends on what you can activate or execute only using your voice, and which other devices are connected to your smart speaker,” said Sara Rampazzi of the University of Michigan. “If injecting voice commands to your speaker can allow an adversary to play music in your behalf, it is not a threat. On the other hand, in our work we demonstrate that in [cases where] a tech-savvy user connected the speaker to many systems, it is possible [to] unlock smart locks connected to a smart speaker, to start the engine of a car using an app that interfaces with your phone, or purchase things online without permission.”
Any device will, of course, be subject to attacks. Malware that allows people to hack other users’ computers and smartphones exist, and can prove incredibly damaging in their own way. In other words, smart speakers aren’t alone. And if people weren’t willing to give up their speakers when they heard that companies listen in to a number of user recordings, they’re probably not going to do so because of two (admittedly concerning) research projects.
Voice-assisted technology isn’t going anywhere. In the years to come, it will only become more widespread — and, in turn, more useful. But by highlighting some of these unrobust security features, the researchers in these two projects have shown that there are still plenty of things users need to be aware of in terms of possible attacks. More notably, they are weaknesses that companies like Amazon and Google must work hard to patch.
“As we go forward with home automation and new ways of interacting with systems, we must think of such gaps and carefully address them,” said Daniel Genkin, one of the researchers on the A.I. assistant laser hack project. “Otherwise issues like this will keep on happening.”
Getting people to spill their secrets to a conversational A.I. requires a whole lot of trust. If the technology is ever going to live up to its massive potential, it’s crucial that users are given every reason to trust it. Clearly there’s still some way to go.
- Finishing touch: How scientists are giving robots humanlike tactile senses
- The sound of science: Why audio is the next frontier in Mars exploration
- Like a wearable guide dog, this backback helps Blind people navigate
- The end of Arecibo: The era of giant telescopes is coming to a close
- Awesome Tech You Can’t Buy Yet: Ultrafast toothbrushes and a laptop/phone hybrid