Internet Archive looking for software to extract political ads from TV archives

With election season coming up, the Internet Archive is looking to publish collections of political ads from earlier US campaigns. They have a massive archive of digitized US TV footage, along with searchable full-text closed captions. Now they're hoping someone can point them to some software to auto-extract the political ads from the corpus.

Volunteers needed: We have a fabulous TV collection, and the US is going into an election period. We would like to pull out the TV Commercials, including the political ads, and match them with the other occurrences, and then put names on them. Then we and others can datamine and surface this information.

We hope we could find all ads so we can know when and were they ran. We would like to not just limit this to political ads because sometimes the ads are the best parts of shows, and many ads are stealthy-political.

To help in this process, we have closed caption transcripts of what is said in US TV as well as full resolution TV recordings. We also often have a rebroadcast of the same program which would likely then have different commercials. We do have to be careful with this data so, we would like to run this locally in our virtual machine “virtual reading room“.

Software Wanted: Political TV Commercial Detection and Naming [Brewster Kahle/Internet Archive]

Notable Replies

  1. Of course they need software. Making a human being watch political ads all day would be unimaginably cruel.

  2. Dear goodness me, you're right on that one. One of my jobs during undergrad was riding herd on a database of campaign ads. Nothing like building a controlled vocabulary and then spending 4 hours at a time watching ads and tagging them with names, years, locations and themes. After that gig, the only person I could have voted for in good conscience was Paul Wellstone, which would have been tricky, him being dead by that point.

    I've done fiberglassing in poorly-sealed suits and I've literally shoveled shit for money; that database was, by far, the worst job I've ever held.

Continue the discussion

4 more replies