Internet Archive looking for software to extract political ads from TV archives

With election season coming up, the Internet Archive is looking to publish collections of political ads from earlier US campaigns. They have a massive archive of digitized US TV footage, along with searchable full-text closed captions. Now they're hoping someone can point them to some software to auto-extract the political ads from the corpus.

Volunteers needed: We have a fabulous TV collection, and the US is going into an election period. We would like to pull out the TV Commercials, including the political ads, and match them with the other occurrences, and then put names on them. Then we and others can datamine and surface this information.

We hope we could find all ads so we can know when and were they ran. We would like to not just limit this to political ads because sometimes the ads are the best parts of shows, and many ads are stealthy-political.

To help in this process, we have closed caption transcripts of what is said in US TV as well as full resolution TV recordings. We also often have a rebroadcast of the same program which would likely then have different commercials. We do have to be careful with this data so, we would like to run this locally in our virtual machine “virtual reading room“.

Software Wanted: Political TV Commercial Detection and Naming [Brewster Kahle/Internet Archive]