Skip to main content

DeepMind has an AI bot that maneuvers through mazes and grabs objects on its own

Google’s DeepMind release a paper this week called Reinforcement Learning with Unsupervised Auxiliary Tasks, which describes a method to increase the learning speed of artificial intelligence and the final performance of agents — or bots. This method includes adding two main additional tasks to perform while the AI trains, and builds on the standard deep reinforcement learning foundation, which is basically a trial-and-error reward/punishment method where AI learns from its mistakes.

The first added task for speeding up AI learning is the ability to understand how to control the pixels on the screen. According to DeepMind, this method is similar to how a baby learns to control his/her hands by moving them and watching those movements. In the case of AI, the bot would understand visual input by controlling the pixels, thus leading to better scores.

Recommended Videos

“Consider a baby that learns to maximize the cumulative amount of red that it observes. To correctly predict the optimal value, the baby must understand how to increase ‘redness’ by various means, including manipulation (bringing a red object closer to the eyes); locomotion (moving in front of a red object); and communication (crying until the parents bring a red object),” DeepMind’s paper states. “These behaviors are likely to recur for many other goals that the baby may subsequently encounter.”

The second added task is used to train the AI to predict what the immediate awards will be based on a brief history of prior actions. To enable this, the team provided equal amounts of previous rewarding and non-rewarding histories. The end result is that the AI can discover visual features that will likely lead to rewards faster than before.

“To learn more efficiently, our agents use an experience replay mechanism to provide additional updates to the critics. Just as animals dream about positively or negatively rewarding events more frequently, our agents preferentially replay sequences containing rewarding events,” the paper adds.

With these two auxiliary tasks added to the previous A3C agent, the resulting new agent/bot is based on what the team calls Unreal (UNsupervised REinforcement and Auxiliary Learning). The team virtually sat this bot in front of 57 Atari games and a separate Wolfenstein-like labyrinth game consisting of 13 levels. In all scenarios, the bot was given the raw RGB output image, providing it direct access to the pixels for 100 percent accuracy. The Unreal bot was rewarded across the board for tasks like shooting down aliens in Space Invaders to grabbing apples in a 3D maze.

Because the Unreal bot can control the pixels and predict if actions will produce rewards, it’s capable of learning 10 times faster than DeepMind’s previous best agent (A3C). Even more, it produces better performance than the previous champion as well.

“We can now achieve 87 percent of expert human performance averaged across the Labyrinth levels we considered, with super-human performance on a number of them,” the company said. “On Atari, the agent now achieves on average 9x human performance.”

DeepMind is hopeful that the work that went into the Unreal bot will enable the team to scale up all of its agents/bots to handle even more complex environments in the near future. Until then, check out the video embedded above showing the AI moving through labyrinths and grabbing apples on its own without any human intervention.

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
This HP Pavilion laptop is a steal at 55% off — hurry!
The HP Pavilion 16t laptop on a white background.

A dependable laptop is a necessity these days, whether you're a professional or a student. If your device is due for a replacement, we highly recommend going for the HP Pavilion 16t, especially now that it's on sale with a 55% discount from HP itself. From its original price of $1,200, it's down to only $530 for massive savings of $670. We're not sure how long you've got until this offer expires, so if you don't want to miss out on one of the best laptop deals we've seen recently, you need to push forward with your purchase as soon as you can.

Why you should buy the HP Pavilion 16t laptop

Read more
AMD on AM4 socket longevity, AM5, and the future
AMD Ryzen 7 5800X3D socketed in a motherboard.

When AMD launched its Ryzen processors on a newly minted AM4 socket in 2017, it was a game changing moment. Finally, AMD was bringing back real competition to Intel. But while Ryzen was killing off the relevance of the venerable quad core, it was also introducing a new idea of socket longevity that would see gamers able to evolve their PCs over time, rather than ripping their guts out every few years.

AM4 went on to be AMD's flagship socket for more than half a decade. But while team red has since moved on to newer sockets and CPUs, AM4's sheer lifespan has become one of its most defining features. It's become something expectant from AMD fans for socket AM5 and beyond.

Read more
Every macOS version in order: from the first public beta to macOS 15
Apple MacBook Air 15 M4 front angled view showing display and keyboard.

Apple’s macOS operating system has changed a lot over the last 25 years, with new features and designs coming and going as the decades have passed. Even the name has been adjusted, starting out as Mac OS X before shortening to OS X and eventually settling on macOS. The world the original version inhabited back in 2000 is very different to today.

Including the initial public beta, Apple has released 22 versions of the Mac operating system so far, with new launches becoming an annual occurrence. But it wasn’t always this way, and there have been some fascinating updates and developments in the time since the first version appeared. Let’s see how macOS has changed over the years.

Read more