Like parents from the 1950s, AI still can’t understand comics. Here’s why

Image recognition has progressed in leaps and bounds over the years. Not too long ago, a challenging recognition task involved asking an AI “Is there a human in this image?” or “Is this breed of dog a dachshund?”

More recently, however, the bar has been raised — and a new research project carried out at the University of Maryland and University of Colorado has another recognition task in its sights: whether or not an AI can read comic books.

In some ways, this is deeply ironic. For a long time, comics were dismissed as a junk medium for kids and barely-literate adults. In recent decades, that perception has shifted, but this present work is enough to have Fredric Wertham (the German psychologist whose 1950s book Seduction of the Innocent came close to de-railing the comics industry) turning in his grave. Simply put: the ability to read and understand comics may turn out to be the next big benchmark for a AI.

“The task requires a lot of common sense and inference”

As it happens, there’s nothing straightforward about comic books. Despite the fact reading them is a relatively simple task for us humans, the ability to juggle both images and text, while understanding a narrative told through deceptively simple graphics, with the brain filling in the story blanks between panels is… well, complex stuff for computers.

“The task requires a lot of common sense and inference,” Mohit Iyyer, a fifth year Ph.D. student in the Department of Computer Science at the University of Maryland, College Park, told Digital Trends. “There can be drastic shifts in scene, camera, and subject from one panel to the next. For example, if panel 1 shows a woman walking towards a car with keys in her hand and panel 2 shows the same car driving on a highway, we as readers would infer that in between the two panels, the woman got into the car, started it, and drove onto the highway. But we can only connect these two panels because we are familiar with how cars work; a computer needs lots of data to be able to make the same types of inferences.”

For their study, Iyyer and his fellow researchers chose to answer the question of whether or not a machine can understand a sequence of panels by looking at whether or not it was able to predict what happens next. More specifically, that meant giving an AI access to three preceding panels and then asking it to guess the next dialog or action to take place, using three multiple choice options.

As if that wasn’t enough of a challenge, the researchers also added an extra task in the form of matching the right dialog to its speaker in panels with multiple dialog boxes.

“To proceed with this task, we first needed to create a fairly large dataset,” fellow researcher Varun Manjunatha told Digital Trends. “We downloaded nearly 4,000 comic books from the Golden Age of comics (read: comics published from 1930-50, whose copyright has expired and are therefore now free for download on the internet). Using neural networks called Faster-RCNNs, [we] extracted panels from pages, and speech bubbles from panels. We then sent these speech bubbles through Google’s OCR engine to extract text.”

However, while this gave a large dataset of panels and text from said panels, the team had to find a way of training the network to make sense of what it was seeing. This was achieved using a cascade of neural networks called LSTMs (“long short-term memory”) to predict the next panel. To put it another way, as with any slightly nerdy kid with a stack of comic books in his or her bedroom, the machine learned the tropes of comics by reading a boatload of comic books. Wait until it starts asking panel questions at next year’s Comic-Con!

“Can we build models that generate artwork as well as dialogue?”

So how did it do? Like some lumbering robot that would have done battle with Marvel’s cadre of super heroes back in the 1960s, it managed well enough to offer a threatening splashpage, but not quite well enough to best the human heroes at the end of the issue.

As Manjunatha said, “Our findings are that while these networks are quite good at predicting what speech or panel might occur next — which is impressive given the highly non-trivial nature of the task — they are not nearly as good as human beings in doing the same.”

In other words, in a world in which we’re quite used to hearing about how machines can do things as well — or better — than us puny humans, reading comic books remains an ability in our favor.

The team isn’t giving up hope, however. In fact, not only are they confident that future versions of the project will be better at carrying out this task, but computer-generated comics may be their next frontier.

“One of the most exciting future directions is in generation,” Iyyer said. “There have been many recent breakthroughs in generating both text and images. Comics present an interesting combination of the two: Can we build models that generate artwork as well as dialogue? The sequential aspect is also interesting. Given a sequence of panels as context, can we generate a new panel that makes sense in this context?”

In other words, mankind’s greatest heroes may have bested the machine menace for now, but they’ll be back in a later installment with an even bigger robot. To be continued…

Emerging Tech

Death from above? How we’re preparing for a future filled with weaponized drones

Drones are beginning to enable everything from search & rescue, to the delivery of medicines to hard-to-reach places. But they are also being used as cheap, and deadly flying bombs. How can we defend ourselves?
Emerging Tech

Scientists created a condom that self-lubricates during sex. You’re welcome

Researchers from Boston University have invented a special coating for condoms which make them respond to bodily fluids by becoming more slippery. Here's how their new breakthrough works.
Emerging Tech

Johns Hopkins’ lab-grown human retina could lead to big insights

Scientists from Johns Hopkins University have successfully grown human retina tissue from scratch in a lab. The work could help with the development of new therapeutics related to eye diseases.
Emerging Tech

Japanese scientists are chewing over an ‘electric gum’ that never loses flavor

Researchers at Japan's Meiji University may have found the secret to unlimited chewing gum -- and it just involves zapping your tongue with electricity. Here's what makes it all work.
Movies & TV

Latest 'Daredevil' season 3 preview puts the spotlight on Wilson Fisk

Season 3 of Daredevil will premiere October 19 on Netflix, so here is everything we know about the next set of adventures featuring Marvel Comics' man without fear before the third season's debut.
Emerging Tech

Behind the unsettling sci-fi landscapes of Simon Stalenhag’s ‘Electric State’

The narrative artbook follows the journey of a young traveler, Michelle, and her robot, Skip, as they head west to the Pacific coast through an alternative America torn apart by civil war and the trappings of military-grade virtual reality.
Emerging Tech

Awesome Tech You Can’t Buy Yet: A.I.-powered cat toys, wallets, food containers

Check out our roundup of the best new crowdfunding projects and product announcements that hit the web this week. You may not be able to buy this stuff yet, but it sure is fun to gawk!
Emerging Tech

Get your head in the clouds with the best vaporizers for flower and concentrates

Why combust dead plant matter when you could vaporize the good stuff and leave the leaves behind? Here's a rundown of the best vaporizers money can buy, no matter what your style is.
Emerging Tech

Get one of the best cheap drones you can buy, and cry less when you crash

Want to get in on all this hot drone action, but don't want to spend half a paycheck to make it happen? There are actually lots of feature-packed budget options. Check out this list of the best drones under $500.
Emerging Tech

Here are the best (and least likely to explode) hoverboards you can buy

With widespread reports of cheap, knock-off Chinese hoverboards exploding, these self-balancing scooters may be getting a rough reputation. They're not all bad, though. Ride in style with our picks for the best -- and safest -- hoverboards
Product Review

Parrot Anafi drone review

It’s definitely not perfect, and there are a few little things that could be improved, but even so, Anafi is unquestionably the best drone that Parrot has ever made.
Emerging Tech

Looking for a good read? Here are the best, most eye-opening books about tech

Sometimes it's sensible to put down the gadgets and pick up a good old-fashioned book -- to read about the latest gadgets, of course. Here are the tech books you need to check out.
Emerging Tech

SpaceX Starlink: Here’s everything you need to know

SpaceX Starlink is the name of Elon Musk's ambitious plan to blanket the globe with high speed broadband internet via a network of satellites. Here's everything you need to know about it
Emerging Tech

Flying food: Uber has set a target date to use drones for meal delivery

Uber is better known for transporting people around town, but it also has a growing meal-delivery business called UberEats. It currently uses drivers and cyclists to deliver the food, but also has plans to use drones.