This bot will destroy you at Pictionary. It's also a huge milestone for A.I.

Like new Alexa Skills on your Amazon Echo, these past couple of decades have seen A.I. gradually gain the ability to best humanity at more and more of our beloved games: Chess with Deep Blue in 1997, Jeopardy with IBM Watson in 2011, Atari games with DeepMind in 2013, Go with AlphaGo in 2016, and so on. To the general public, at least, each instance turns the abstract path of computational progress into a spectator sport. Skynet is getting smarter. How do we know? Because check out the growing number of pastimes it can convincingly beat us at.

Building a Pictionary master

Being able to reduce a complex real-world image into a sketch is, itself, pretty impressive. It takes a level of abstraction to look at a human face and see it as an oval with two smaller ovals for eyes, a line for a nose, and a half-circle for a mouth. In kids, the ability to perceive an image in this way shows, among other things, a burgeoning cognitive understanding of concepts.

However, as with many aspects of A.I., often summarized as Moravec’s Paradox that the “hard problems are easy and the easy problems are hard,” it’s a significant challenge for machine intelligence — despite the fact that it’s a basic, unremarkable skill for the majority of two-year-old children.

It’s not an unsolvable challenge, though. In 2016, we wrote about Song’s work with a tool called Sketch, a deep-learning neural network that was able to recognize hand-drawn sketches and use them to search for real-life products. That particular network was trained using a dataset consisting of some 30,000 sketch-photo comparisons, allowing it to be able to recognize the way real objects are presented in hand drawing. Pixelor does something similar, but can also generate its own drawings, rather than just recognizing other people’s.

But that’s not enough to win at Pictionary. Pictionary is a time-challenged game where the goal isn’t just to draw, say, a cat, but to draw a cat in as few strokes as possible. You could be the world’s greatest artist but, if it takes you 12 hours to draw a picture-perfect cat, you’re a terrible Pictionary player.

This meant building an A.I. that could study humans to see which strategies they use to play Pictionary well. As Song said, “What are the most important bits to draw to enable other human judges to be able to guess? We want our drawing to be guessed as early as possible.”

To do this, the researchers took QuickDraw, the largest human sketch dataset available to date. They then built a neural sorting algorithm that prioritizes the order of strokes an artist needs to make; giving a guessable representation of an object in as few lines as possible. This means breaking sketches down into strokes, then shuffling the order of these strokes and testing the results until they establish the precise order in which they need to be laid down on paper.

For example, an artist could start drawing a cat by sketching a circular outline for its head. But a circle could be any number of things, even if you know that it is supposed to represent a head. Draw two pointy ears, however, or two sets of whiskers and the number of potential things that you could be drawing reduces very, very quickly. This information is then used to instruct the sketching agent.

Song said that the team could release a public-facing version of this Pictionary-playing bot so that human players can have their own go at beating a sketching A.I. master. (Who knows? Playing an expert could even help improve your own Pictionary game.)

More than meets the eye

There’s more to Pixelor than just another trivial game-playing bot, however. Just like a computer system has both a surface-level interface that we interact with and under-the-hood backend code, so, too, does every major A.I. game-playing milestone have an ulterior motive. Unless they’re explicitly making computer games, research labs don’t spend countless person-hours building game-playing A.I. agents just to add another entry on the big list of things humans are no longer the best at. The purpose is always to advance some fundamental part of A.I. problem-solving.

In the case of Pixelor, the hidden objective is to make machines that are better able to figure out what’s important to a human in a particular scene. When we look at an image, we’re immediately able to tell what the most salient details are.

Let’s say you’re driving home from work. While the trees lining the side of the road may be picturesque and the billboard for a new movie could be interesting, neither is as important as the face and body language of the person who may or may not be about to walk out in front of you. Before you have even consciously processed the information, your brain has singled out the most important details. How do you teach a computer to be able to do this? Well, it turns out that one great way to do so is to see how humans prioritize the salient recognizable details in an image when they’re sketching it.

“There’s no human knowledge inherently embedded in photos [alone],” said Song. “What we want is human data which can give us signals on how humans understand an object.”

As noted, a good Pictionary player, like a good boxer, will know the absolute minimum they need to do to achieve a certain objective. This, in a macro sense, is what Yi-Zhe Song and his colleagues care about. It’s not anything as trivial as getting a computer to play a game; it’s getting a computer to understand what’s important about certain scenes — and, hopefully, to be able to better generalize.

As everything from self-driving cars to robots in the workplace become increasingly common, this is an essential task to solve.

A paper describing the work will be presented at SIGGRAPH Asia 2020 in November.