Artificial intelligence creates sound effects for silent videos that fool humans

Researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) trained a neural network to recognize materials (e.g., metal grate, plants, concrete sidewalk) being hit with a drumstick, and synthesize sounds to accompany the actions. It did well enough to fool humans into thinking the sounds were real. From the abstract:

Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions.


Notable Replies

  1. brzap says:

    This technology could save me a ton of foley work on my upcoming film "Drumsticks tapping on things".

  2. Fix it in editing. That's what we did for "Drumsticks: Offscreen!"

  3. I wonder to what extent the neural network is associating sounds with the material itself (its texture and movement), and how much it's purely learning about the motion of the drumstick. The latter would be less impressive, though still cool; if you think about it, with a sufficiently high frame rate you could simply measure the sound by seeing the vibrations through the stick, with no AI needed. Like that kind of remote audio bug that works by shining a laser onto a window or similar rigid surface.

    Either way, there's something neat about retrieving information that was never directly recorded. It'd be cool if it could get good enough to, say, extract on-set sounds from silent films.

  4. Nowhere is Poetry so actual as in foley work. Metaphor and simile are reality:

    This cinder block being dragged along another cinder block IS an ancient monolith's hidden door opening unexpectedly.

    These empty shoes crushing gravel are LIKE a person walking down a road.

    When I was a kid playing around with tape recorders, I used an accordion to suggest an automobile accident. It worked.

  5. Let's see it try to keep up with this rocker.

Continue the discussion

5 more replies