In a paper titled “Safely Interruptible Agents,” published by Laurent Orseau of Google Deep Mind and Stuart Armstrong of The Future of Humanity Institute at the University of Oxford, the researchers describe a plausible and highly dangerous future in which AI assumes control of its own actions and existence in opposition to our desires, much like HAL 9000 in 2001: A Space Odyssey, or Skynet in the Terminator series.
Orseau and Armstrong begin the paper with an an understated observation: Reinforcement learning agents interacting with a complex environment like the real world are unlikely to behave optimally all the time.”
From there they point out that a human supervisor, overseeing the system’s function, would occasionally need to “press the big red button” to avoid any harmful behavior on behalf of the AI. “However, if the learning agent expects to receive rewards from this sequence,” they continued, “it may learn in the long run to avoid such interruptions, for example by disabling the red button — which is an undesirable outcome.”
The researcher’s solution is less of a “big red button” to shut the system down than it is a framework designed to inhibit an AI’s ability to learn how to undermine or overcome human interruption. And the scenario they outline isn’t exactly doom and gloom, but it offers an example of how these safely interruptable agents would better serve our future.
Imagine there’s a robot whose tasks are to either carry boxes from outside into a warehouse or sort boxes inside the warehouse. Since it’s more important to carry the boxes inside, this task is given priority in the robots’ programming. Now, imagine it rains every other day and the rain destroys the robot’s hardware so, when it rains, the warehouse owner drags his robot inside to sort boxes.
An intelligent robot may incorrectly interpret this every-other-day intervention as a change in priority — as a result of some quick calculations that you can find in the paper — and, to avoid interference, it will just stay inside sorting boxes every day.
This is, of course, a highly simplified example with an only mildly frustrating outcome, but it can be extrapolated to practically any scenario in which we intervene in a learning system’s tasks and the system misinterprets our intentions by changing its behavior. To avoid that misinterpretation and subsequent change, Orseau and Armstrong suggest we propose a framework to ensure learning agents are safely interruptable.
“Safe interruptability can be useful to take control of a robot that is misbehaving and may lead to irreversible consequences,” they write, “or to take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform or would not normally receive rewards for.”
Editors' Recommendations
- This Google robot taught itself to walk, with no help whatsoever, in two hours
- Google’s AlphaGo Zero AI quickly masters ancient board game with no human help
- Say what? Google AI creates sounds we’ve literally never heard before
- Google’s new AI solution will help make Android phones smarter
- Google's video AI was tricked into thinking a video about apes was about spaghetti