Skip to main content

MIT’s latest A.I. is freakishly good at determining what’s going on in videos

How a Temporal Relation Network understands what's going on there

Just a few frames of information telling a story are all we need to understand what is going on. This is, after all, the basis for comic books — which provide just enough of the important story beats for us to follow what has happened. Sadly, robots equipped with computer vision technology struggle to do this. Until now, at least.

Recommended Videos

Recently, the Massachusetts Institute of Technology (MIT) demonstrated a new type of artificial intelligence system which uses a neural network to fill in the blanks in video frames to work out what activity is taking place. The results make it astonishingly good at determining what is taking place in a video.

“The newly developed temporal relation modules enable the A.I. system to analyze a few key frames and estimate the temporal relation among them, in order to understand what’s going on in the video — such as a stack of objects [being] knocked down,” Bolei Zhou, a former Ph.D. student in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), who is now an assistant professor of computer science at the Chinese University of Hong Kong, told Digital Trends. “Because the model works with key frames sparsely sampled from the incoming video, the processing efficiency is greatly improved, enabling real-time activity recognition.”

Another exciting property of the A.I. model is that it can anticipate and forecast what will happen early on by viewing frames of video. For instance, if it sees a person holding a bottle, the algorithm anticipates that they might take a drink or possibly squeeze it. Such anticipation abilities will be essential for artificial intelligence used in domains like autonomous driving, where it could proactively prevent accidents by guessing what will happen from moment to moment.

“It [could also] be used to monitor human behaviors, such as a home robot assistant which could anticipate your intention by delivering things beforehand,” Zhou continued. “It [could additionally be employed] to analyze the massive [number of] videos online, to do better video understanding and video retrieval.”

The next step of the project will involve increasing the A.I.’s ability to recognize a broader number of objects and activities. The team is also working with robotics researchers to deploy this activity recognition into robot systems. These could see enhanced perception and visual reasoning skills as a result.

Luke Dormehl
Former Digital Trends Contributor
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
iOS 18’s best AI tools arrive in December, but Siri has a longer wait
Apple Intelligence on iPhone 15 Pro.

The Apple Intelligence toolkit has witnessed a staggered mix of delayed features and underwhelming perks. But it seems that the most promising set of those AI tools that Apple revealed at WWDC earlier this year is right around the corner.

In the latest edition of his PowerOn newsletter, Bloomberg’s Mark Gurman writes that the iOS 18.2 update will start rolling out via the stable channel in the first week of December.

Read more
No one wants to buy AMD’s Zen 4 chips — what’s going on?
A hand holding the Ryzen 9 7950X in front of a green light.

AMD’s Zen 4 processors could be in deep trouble, according to recent sales data. In fact, it looks like Zen 4 chips could be five times less popular than the previous-generation Zen 3. Why is no one buying Zen 4?

The data comes from German retailer Mindfactory (via Reddit), which provides daily sales data for a range of processors, both Intel and AMD. And the findings for December 2022 do not make pleasant reading for fans of Team Red.

Read more
Microsoft quits its creepy, emotion-reading A.I.
blonde woman with an expressionless face looks at camera while laser lights scan her features

Microsoft announced it will stop the development and distribution of controversial emotion-reading software as big tech companies pivot toward privacy and security. The company also says it will heavily restrict its own facial recognition platform.

Microsoft’s shift away from emotional recognition software is another sign of big tech’s growing prioritization of privacy. The company also admits there is little scientific evidence behind the technology.

Read more