Like parents from the 1950s, AI still can’t understand comics. Here’s why

Image recognition has progressed in leaps and bounds over the years. Not too long ago, a challenging recognition task involved asking an AI “Is there a human in this image?” or “Is this breed of dog a dachshund?”

More recently, however, the bar has been raised — and a new research project carried out at the University of Maryland and University of Colorado has another recognition task in its sights: whether or not an AI can read comic books.

In some ways, this is deeply ironic. For a long time, comics were dismissed as a junk medium for kids and barely-literate adults. In recent decades, that perception has shifted, but this present work is enough to have Fredric Wertham (the German psychologist whose 1950s book Seduction of the Innocent came close to de-railing the comics industry) turning in his grave. Simply put: the ability to read and understand comics may turn out to be the next big benchmark for a AI.

“The task requires a lot of common sense and inference”

As it happens, there’s nothing straightforward about comic books. Despite the fact reading them is a relatively simple task for us humans, the ability to juggle both images and text, while understanding a narrative told through deceptively simple graphics, with the brain filling in the story blanks between panels is… well, complex stuff for computers.

“The task requires a lot of common sense and inference,” Mohit Iyyer, a fifth year Ph.D. student in the Department of Computer Science at the University of Maryland, College Park, told Digital Trends. “There can be drastic shifts in scene, camera, and subject from one panel to the next. For example, if panel 1 shows a woman walking towards a car with keys in her hand and panel 2 shows the same car driving on a highway, we as readers would infer that in between the two panels, the woman got into the car, started it, and drove onto the highway. But we can only connect these two panels because we are familiar with how cars work; a computer needs lots of data to be able to make the same types of inferences.”

For their study, Iyyer and his fellow researchers chose to answer the question of whether or not a machine can understand a sequence of panels by looking at whether or not it was able to predict what happens next. More specifically, that meant giving an AI access to three preceding panels and then asking it to guess the next dialog or action to take place, using three multiple choice options.

As if that wasn’t enough of a challenge, the researchers also added an extra task in the form of matching the right dialog to its speaker in panels with multiple dialog boxes.

“To proceed with this task, we first needed to create a fairly large dataset,” fellow researcher Varun Manjunatha told Digital Trends. “We downloaded nearly 4,000 comic books from the Golden Age of comics (read: comics published from 1930-50, whose copyright has expired and are therefore now free for download on the internet). Using neural networks called Faster-RCNNs, [we] extracted panels from pages, and speech bubbles from panels. We then sent these speech bubbles through Google’s OCR engine to extract text.”

However, while this gave a large dataset of panels and text from said panels, the team had to find a way of training the network to make sense of what it was seeing. This was achieved using a cascade of neural networks called LSTMs (“long short-term memory”) to predict the next panel. To put it another way, as with any slightly nerdy kid with a stack of comic books in his or her bedroom, the machine learned the tropes of comics by reading a boatload of comic books. Wait until it starts asking panel questions at next year’s Comic-Con!

“Can we build models that generate artwork as well as dialogue?”

So how did it do? Like some lumbering robot that would have done battle with Marvel’s cadre of super heroes back in the 1960s, it managed well enough to offer a threatening splashpage, but not quite well enough to best the human heroes at the end of the issue.

As Manjunatha said, “Our findings are that while these networks are quite good at predicting what speech or panel might occur next — which is impressive given the highly non-trivial nature of the task — they are not nearly as good as human beings in doing the same.”

In other words, in a world in which we’re quite used to hearing about how machines can do things as well — or better — than us puny humans, reading comic books remains an ability in our favor.

The team isn’t giving up hope, however. In fact, not only are they confident that future versions of the project will be better at carrying out this task, but computer-generated comics may be their next frontier.

“One of the most exciting future directions is in generation,” Iyyer said. “There have been many recent breakthroughs in generating both text and images. Comics present an interesting combination of the two: Can we build models that generate artwork as well as dialogue? The sequential aspect is also interesting. Given a sequence of panels as context, can we generate a new panel that makes sense in this context?”

In other words, mankind’s greatest heroes may have bested the machine menace for now, but they’ll be back in a later installment with an even bigger robot. To be continued…


It’s not really a ‘budget’ phone, but the iPhone XR is still a great value

The iPhone XR is no "budget" phone and shouldn't be labeled as such. At $750, it offers a beautiful design, top-tier specs, and most importantly, isn’t all that different than the more expensive iPhone XS and XS Max.
Emerging Tech

An A.I. is designing retro video games — and they’re surprisingly good

Researchers from Georgia Tech have demonstrated how artificial intelligence can be used to create brand-new video games after being shown hours of classic 8-bit gaming action for inspiration.
Emerging Tech

Don’t be fooled — this automated system sneakily manipulates video content

In the vein of “deep fakes," Recycle-GAN, a new system from Carnegie Mellon University, presents another case for how difficult it will be to distinguish fiction from reality in the future.
Emerging Tech

Neural networks? Machine learning? Here's your secret decoder for A.I. buzzwords

Don't know your machine learning from your evolutionary algorithms? Worried every time you sit on a bus in case Google's Larry Page sits down next to you with a pop quiz? Our handy A.I. buzzword guide is here to help.
Emerging Tech

Moxi the ‘friendly’ hospital robot wants to help nurses, not replace them

Moxi is a "friendly" hospital robot from Texas-based Diligent Robotics. The wheel-based bot, which begins trials this week, aims to free nurses from routine tasks so they can spend more time with patients.
Emerging Tech

How do 3D printers work? Here’s a super-simple breakdown

How do 3D printers work, exactly? If you ever wondered how these magical machines create 3D objects in a matter of hours, then look no further than this dead-simple breakdown of the four most common printing technologies.
Emerging Tech

Giant wind farm in Morocco will help mine cryptocurrency, conserve energy

One of the windiest parts of Morocco is set to get a $2 billion wind farm power plant, which could help power eco-friendly cryptocurrency mining in a more environmentally friendly way.
Emerging Tech

Sick of walking everywhere? Here are the best electric skateboards you can buy

Thanks for Kickstarter and Indiegogo, electric skateboards are carving a bigger niche than you might think. Whether you're into speed, mileage, or something a bit more stylish, here are the best electric skateboards on the market.
Emerging Tech

Robots are going to steal 75 million jobs by 2025 — but there’s no need to panic

According to the World Economic Forum, robots and A.I. will take 75 million jobs from hardworking humans by 2025. That's the bad news. The good news is that they will create far more jobs than that.

Cyber Monday 2018: When it takes place and where to find the best deals

Cyber Monday is still a ways off, but it's never too early to start planning ahead. With so many different deals to choose from during one of the biggest shopping holidays of the year, going in with a little know-how makes all the…
Smart Home

Amazon might open 3,000 cashier-free Amazon Go stores by 2021

According to new reporting by Bloomburg, anonymous sources within Amazon say that CEO Jeff Bezos is considering opening up to 3,000 of the company's cashier-less, experimental Amazon Go stores by 2021.
Emerging Tech

Wormlike motion sculptures show how athletes move in 3D

Researchers at MIT have developed a system that offers athletes a unique way to visualize their bodies in motion. An algorithm scans 2D videos of a person in motion, and generates data points that can be 3D-printed into "motion sculptures."
Emerging Tech

Harvard’s soft robotic exosuit adapts itself to the needs of every wearer

Harvard engineers have developed a new multi-joint, textile-based soft robotic exosuit, designed to help soldiers, firefighters, and other rescue workers. Here's what makes it so exciting.
Emerging Tech

These flying cars want to take your commute to new heights

The future is closer than you'd think: Companies around the world are working on flying car models, with many successful tests! Here are all the flying cars and taxis currently in development, and how they work!