TimesMachine: hackable browser for the public domain NY Times archive

The New York Times has quietly launched "TimesMachine," a slick, API-enabled browser for PDFs of the public domain archives of the paper's run from 1851 to 1922. The API allows anyone to hack their own custom browser for this amazing archive. (Note: the items retrieved by this tool bear copyright notices, but various public statements by the times affirm that this is freely usable, public domain stuff).

As part of eliminating TimeSelect, The New York Times has decided to make all the public domain articles from 1851-1922 available free of charge. These articles are all in the form of images scanned from the original paper. In fact from 1851-1980, all 11 million articles are available as images in PDF format. To generate a PDF version of the article takes quite a bit of work – each article is actually composed of numerous smaller TIFF images that need to be scaled and glued together in a coherent fashion.


Link to blog post with background