Skip to main content

MIT algorithm can predict the (immediate) future from still images

Creating Videos of the Future
Humans still can’t predict elections but we’re pretty good at predicting the immediate future. Baby drops glass cup, cup falls and shatters, and baby starts to cry. We’re so good at these short-term forecasts that we can often even describe what events will happen next in an image.

But what’s second nature for us can prove complicated for computers. Will the glass break or bounce? Will the baby laugh or cry?

A team of researchers from the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a system that can predict the following events in images and generate videos to depict them. The system needs work — its current productions are simple, short, and unassuming — but it stands out for its unique approach and accuracy.

“Instead of building up scenes frame by frame, we focus on processing the entire scene at once,” Carl Vondrick, PhD at MIT CSAIL and lead author of the paper, told Digital Trends.
video-examples-with-input-and-output

Alternative computer vision models that attempt the same task use recurrent networks to generate predictive videos on a frame-by-frame basis. The system developed by Vondrick and his team uses “convolutional networks” to generate all 32 frames simultaneously.

“The existing approach of going frame by frame has a certain logic,” Vondrick said, “but it also creates a massive margin for error. It’s sort of like a big game of ‘Telephone,” which means that the message most likely will fall apart by the time you go around the whole room.

“In contrast, our approach is the ‘Telephone’ equivalent of speaking to everyone in the room at once,” he added.

The researchers trained the system on a year of footage packed into two million videos and — in order to generate all frames at once — taught it distinguish foregrounds from backgrounds, and mobile objects from stationary ones. They then showed the system still images and had it generate short clips of subsequent events.

Once the system could generate video clips, Vondrick and his team set out to refine it through a method called adversarial learning.

“The idea behind adversarial learning is to have two neural networks compete against each other,” Vondrick said. “One network tries to decide what is real versus fake, and another tries to generate something that fools the first network.”

Through this computer competition the generative algorithm improved the accuracy of its video clips until it was able to fool human subjects 20 percent more often than a baseline model, according to a paper that will be presented next week at the Neural Information Processing Systems conference in Barcelona.

But with accuracy comes complexity and with complexity comes obstacles.

The current system’s videos are short — a mere one and a half seconds long. If the clips were much longer than that, they’d risk their consistency. “The key challenge is being able to reliably track the relationships between all of the objects in a scene … to make sure that the video that’s being generated still makes sense five or ten seconds later,” Vondrick said. To develop accurate and long videos, the system may need human input to help it grasp context and connection between seemingly unrelated actions, such as jogging and showering.

Vondrick’s ambitious end goal is to develop an algorithm that can create believable feature-length films, though he admits that is still some years off. In the near term though he thinks this system could refine AI systems by helping them adapt to unpredictable environments.

Editors' Recommendations

Dyllan Furness
Dyllan Furness is a freelance writer from Florida. He covers strange science and emerging tech for Digital Trends, focusing…
The best tablets in 2024: top 11 tablets you can buy now
Disney+ app on the iPad Air 5.

As much as we love having the best smartphones in our pockets, there are times when those small screens don't cut it and we just need a larger display. That's when you turn to a tablet, which is great for being productive on the go and can be a awesome way to unwind and relax too. While the tablet market really took off after the iPad, it has grown to be quite diverse with a huge variety of products — from great budget options to powerhouses for professionals.

We've tried out a lot of tablets here at Digital Trends, from the workhorses for pros to tablets that are made for kids and even seniors -- there's a tablet for every person and every budget. For most people, though, we think Apple's iPad Air is the best overall tablet — especially if you're already invested in the Apple ecosystem. But if you're not an Apple user, that's fine too; there are plenty of other great options that you'll find in this roundup.

Read more
How to delete a file from Google Drive on desktop and mobile
Google Drive in Chrome on a MacBook.

Google Drive is an excellent cloud storage solution that can be accessed from numerous devices. Whether you do most of your Google Drive uploading or downloading from a PC, Chromebook, or mobile device, there’s going to come a time when you’ll need to delete a file (or two). Fortunately, the deletion process couldn’t be more straightforward. We’ve also put together this helpful guide to show you how to trash your Drive content a couple of different ways.

Read more
Windows 11 might nag you about AI requirements soon
Copilot on a laptop on a desk.

After recent reports of new hardware requirements for the upcoming Windows 11 24H2 update, it is evident that Microsoft is gearing up to introduce a bunch of new AI features. A new report now suggests that the company is working on adding new code to the operating system to alert users if they fail to match the minimum requirements to run AI-based applications.

According to Albacore on X (formerly known as Twitter), systems that do not meet the requirements will display a warning message in the form of a watermark. After digging into the latest Windows 11 Insider Build 26200, he came across requirements coded in the operating system for an upcoming AI File Explorer feature. The minimum requirement includes an ARM64 processor, 16GB of memory, 225GB of total storage, and a Qualcomm Snapdragon X Elite NPU.

Read more