MIT’s PixelPlayer Lets Your Isolate Sounds From Videos

via Bobby Owsinski blog

One thing that musicians and producers alike forever longed for was software that would separate a mix into its vital instrumental and vocal parts. We’ve had this available for a few years now, but the next step is extracting different sounds from a video by pointing on the part of the screen that we want to hear. Researchers at MIT now have this figured out thanks to artificial intelligence, and have developed an app they’re calling PixelPlayer.

Since it’s artificial intelligence-based, PixelPlayer needs to be trained (and the researchers have done that). After that, it will identify the source of sound on the video, and calculate the volume of each pixel in the image. It then “spatially localizes” them, meaning that it identifies regions in the clip that generate similar sound waves. This is outlined in a new paper called “The Sound Of Pixels.

The current app is based on a neural network trained on MUSIC (Multimodal Sources of Instrument Combinations), which is based on 714 unlabeled videos selected from YouTube containing all sorts of acoustic instruments, including guitars, cellos, clarinets, flutes, and other instruments.

Hear and see more!