Skip to main content

MIT research may revolutionize structural engineering with a camera that ‘sees’ sound

mit audio from video algorithm listenmit
Image used with permission by copyright holder
Last summer, a team of researchers from MIT, Microsoft and Adobe discovered a way to reproduce audio by analyzing microscopic vibrations in recorded video footage. In the past few months, MIT Ph.D. candidate Abe Davis and his team took this video algorithm to another level: predicting how objects will move, according to Mashable.

Article updated by Chris Palermino 3/20/15: MIT technology has new implications

In a TED Talk on Monday in Vancouver, Davis demonstrated how the new algorithm works with a silent video of a bush moving gently in the breeze. After converting the file with this new software into a photo-video file, he clicked and prodded at a ‘still’ image of the bush with his cursor. His mouse created a realistic, virtual ‘wind,’ showing how the bush would move when it was blown.

While still in beta, this technology has a myriad of applications in fields like engineering, transportation and gaming. One example that has been suggested is using the algorithm to help pinpoint stress points in a bridge’s design. Imagine if this could help predict potential rail issues on trains, structural issues in buildings, or the ramifications of that last punch you threw in Tekken?

Original article from 8/5/14 by Drew Prindle

No audio? No problem. A team of researchers from MIT, Microsoft, and Adobe has developed an algorithm that allows them to reconstruct an audio signal even when only visual information is available. Using nothing more than a high speed camera and a special processing algorithm, the team was able to extract the audio signals in a room from 15 feet away through soundproof glass.

The Visual Microphone: Passive Recovery of Sound from Video

How? Well when you get down to it, sound waves are really just tiny disturbances in the air. Therefore, when sound waves strike something delicate –say, a piece of tin foil, a bag of chips, or the leaves of a house plant– they cause the object to vibrate ever so slightly. It’s just like how the rear-view mirror vibrates when your buddy turns cranks the subwoofer on his car stereo, just on a much more minute scale. The waves that occur when you’re just having a conversation are much weaker, and tend to cause more minute vibrations. To the naked eye, these disturbances are practically imperceptible — but with the help of high-speed photography, the team was able to capture movements as small as a tenth of a micrometer, and then use that information to guesstimate and rebuild the audio signal. Check out the video to see it in action:

As if that wasn’t incredible enough, the team also a demonstrated variation on the algorithm that allows them to extract sound from ordinary 60 frame per second video footage. Generally speaking, the sensors on most digital cameras are designed to scan images horizontally, one row at a time. Normally, that’s not a problem, but when you’re shooting fast-moving objects, this can sometimes leads to odd visual artifacts. The team was able to exploit this technological quirk to tease out information about the objects’ high-frequency vibration and, once again, use that info to reconstruct a usable (albeit murky) audio signal. It’s not quite as clear as the audio signal ripped from the high-speed camera, but even so, the fact that this kind of reconstruction is possible is mind blowing.

The researchers are presenting their work at the computer graphics conference Siggraph this month. Find out more here.

Editors' Recommendations

Drew Prindle
Former Digital Trends Contributor
Drew Prindle is an award-winning writer, editor, and storyteller who currently serves as Senior Features Editor for Digital…
Digital Trends’ Top Tech of CES 2023 Awards
Best of CES 2023 Awards Our Top Tech from the Show Feature

Let there be no doubt: CES isn’t just alive in 2023; it’s thriving. Take one glance at the taxi gridlock outside the Las Vegas Convention Center and it’s evident that two quiet COVID years didn’t kill the world’s desire for an overcrowded in-person tech extravaganza -- they just built up a ravenous demand.

From VR to AI, eVTOLs and QD-OLED, the acronyms were flying and fresh technologies populated every corner of the show floor, and even the parking lot. So naturally, we poked, prodded, and tried on everything we could. They weren’t all revolutionary. But they didn’t have to be. We’ve watched enough waves of “game-changing” technologies that never quite arrive to know that sometimes it’s the little tweaks that really count.

Read more
Digital Trends’ Tech For Change CES 2023 Awards
Digital Trends CES 2023 Tech For Change Award Winners Feature

CES is more than just a neon-drenched show-and-tell session for the world’s biggest tech manufacturers. More and more, it’s also a place where companies showcase innovations that could truly make the world a better place — and at CES 2023, this type of tech was on full display. We saw everything from accessibility-minded PS5 controllers to pedal-powered smart desks. But of all the amazing innovations on display this year, these three impressed us the most:

Samsung's Relumino Mode
Across the globe, roughly 300 million people suffer from moderate to severe vision loss, and generally speaking, most TVs don’t take that into account. So in an effort to make television more accessible and enjoyable for those millions of people suffering from impaired vision, Samsung is adding a new picture mode to many of its new TVs.
[CES 2023] Relumino Mode: Innovation for every need | Samsung
Relumino Mode, as it’s called, works by adding a bunch of different visual filters to the picture simultaneously. Outlines of people and objects on screen are highlighted, the contrast and brightness of the overall picture are cranked up, and extra sharpness is applied to everything. The resulting video would likely look strange to people with normal vision, but for folks with low vision, it should look clearer and closer to "normal" than it otherwise would.
Excitingly, since Relumino Mode is ultimately just a clever software trick, this technology could theoretically be pushed out via a software update and installed on millions of existing Samsung TVs -- not just new and recently purchased ones.

Read more
AI turned Breaking Bad into an anime — and it’s terrifying
Split image of Breaking Bad anime characters.

These days, it seems like there's nothing AI programs can't do. Thanks to advancements in artificial intelligence, deepfakes have done digital "face-offs" with Hollywood celebrities in films and TV shows, VFX artists can de-age actors almost instantly, and ChatGPT has learned how to write big-budget screenplays in the blink of an eye. Pretty soon, AI will probably decide who wins at the Oscars.

Within the past year, AI has also been used to generate beautiful works of art in seconds, creating a viral new trend and causing a boon for fan artists everywhere. TikTok user @cyborgism recently broke the internet by posting a clip featuring many AI-generated pictures of Breaking Bad. The theme here is that the characters are depicted as anime characters straight out of the 1980s, and the result is concerning to say the least. Depending on your viewpoint, Breaking Bad AI (my unofficial name for it) shows how technology can either threaten the integrity of original works of art or nurture artistic expression.
What if AI created Breaking Bad as a 1980s anime?
Playing over Metro Boomin's rap remix of the famous "I am the one who knocks" monologue, the video features images of the cast that range from shockingly realistic to full-on exaggerated. The clip currently has over 65,000 likes on TikTok alone, and many other users have shared their thoughts on the art. One user wrote, "Regardless of the repercussions on the entertainment industry, I can't wait for AI to be advanced enough to animate the whole show like this."

Read more