Skip to main content

MIT scientists built a neural network that can pass the ‘Audio Turing Test’

Visually-Indicated Sounds
Sound is one of the building blocks which helps us understand our surroundings, and may even aid us in building up an intuitive theory of physics and how the physical world works.

A new project by a team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) sets out to replicate this same kind of learning in a computer. What they have created is an algorithm capable of being shown a silent video clip of an object being hit and then selecting the appropriate impact sound.

Sound prediction 3

“In order to know what sound to play, the computer has to know something about the object being struck, and about the action that produced the sound,” Andrew Owens, one of the PhD students who worked on the project, tells Digital Trends. “For example, the sound that dirt makes when it is hit is very different from the sound that ceramic makes. Therefore, in order to predict these sounds well, the algorithm implicitly has to learn to recognize materials.”

The end result is a sort-of “Turing Test for sound,” in which the goal is to choose an impact sound accurate enough that it will fool a human who is shown the footage.

“It is in the context of AI agents learning from interactions with the world that we see as our main contribution,” Owens continues. “You can think of it as a toy version of learning about the world the way that babies do: by poking and prodding the objects around them and examining what happens.”

The technology behind this specific advance is something called deep learning: an important area of artificial intelligence which has, in recent years, prompted some of the field’s biggest breakthroughs. Deep learning works using something called a neural network, a vast artificial brain used to discover the relationship between cause and effect where this relationship is complex or unclear. In short, a neural network allows a computer to “learn” without requiring a human to explicitly label all of the examples that it is shown.

We wonder what sound it predicts will accompany the rise of machine intelligence? May we suggest The Terminator theme….

Editors' Recommendations