Features Podcasts Family Video Comics Music Tech Science Books Film & TV Games ✚

Jill

Copyright renewal records for US books finally online

Cory Doctorow at 11:09 pm Tue, Jun 24, 2008

— FEATURED —

Science

Last chance to enter the Armchair Taxonomist challenge!

Book Review

Black Code: how spies, cops and crims are making cyberspace unfit for human habitation

Book Review

We Can Fix it! - a graphic novel time travel memoir

Science

The technology that links taxonomy and Star Trek

— FOLLOW US —

Boing Boing is on Twitter and Facebook. Subscribe to our RSS feed or daily email.

 

— POLICIES —

Except where indicated, Boing Boing is licensed under a Creative Commons License permitting non-commercial sharing with attribution

 

— FONTS —

Tweet
Kindle
A Google engineer has tracked down, munged and XMLified the copyright renewal notices for all the books the US Copyright Office knows about -- now there's a one-click way to discover if an old book is in the public domain (more or less) and who holds the copyright if it isn't.
For U.S. books published between 1923 and 1963, the rights holder needed to submit a form to the U.S. Copyright Office renewing the copyright 28 years after publication. In most cases, books that were never renewed are now in the public domain. Estimates of how many books were renewed vary, but everyone agrees that most books weren't renewed. If true, that means that the majority of U.S. books published between 1923 and 1963 are freely usable.

How do you find out whether a book was renewed? You have to check the U.S. Copyright Office records. Records from 1978 onward are online (see http://www.copyright.gov/records) but not downloadable in bulk. The Copyright Office hasn't digitized their earlier records, but Carnegie Mellon scanned them as part of their Universal Library Project, and the tireless folks at Project Gutenberg and the Distributed Proofreaders painstakingly typed in every word.

Thanks to the efforts of Google software engineer Jarkko Hietaniemi, we've gathered the records from both sources, massaged them a bit for easier parsing, and combined them into a single XML file available for download here.

Link (Thanks, Frances!)

I write books. My latest is a YA science fiction novel called Homeland (it's the sequel to Little Brother). More books: Rapture of the Nerds (a novel, with Charlie Stross); With a Little Help (short stories); and The Great Big Beautiful Tomorrow (novella and nonfic). I speak all over the place and I tweet and tumble, too.

MORE:  Book • Comics

More at Boing Boing

The technology that links taxonomy and Star Trek

Hackers prepare for first "national holiday" in their honor

  • nilesgibbs

    Simply wonderful!

  • Anonymous

    We Distributed Proofreaders didn’t type in each word. It was scanned and OCRed.

    We did however proofread every word of the OCR twice then reformat it as a plain text document with consistent formatting for each entry so our bit is probably pretty close to what is in the original and could be converted to XML.

    If you want to help proofread the worlds literature then you take 10 minutes to proofread just one page at http://www.pgdp.net/c/. Do a page a day and after a year that’s like a whole book!

  • orwant

    Chuckeye,

    No free Harlan here, I’m afraid. Rightsholders had to renew within the 28th year after publication, but lag at the Copyright Office often led to the renewal date occurring in the 29th year.

  • Doug Nelson

    Where’s the one click? All I found was a 50MB zip file. There’s a lot of clicks involved in reading this. Where’s the online version? They’re google, for crying out lout, they should have an online version. A 50MB zip file available for download is not the same thing as “records online”.

    Plus it crashed my browser when I tried to read the file.

  • Kieran O’Neill

    Lol. Cory did say “one click … (more or less)”.

    For now, I’d guess it’s more like “one small script in the language of your choice”, but it probably won’t be long before someone hacks together a generic solution for the less technical.

    wxPython, anyone?

  • eclectro

    Google, please give this person a raise.

  • nilesgibbs

    Now that I’ve actually looked at it, it’s like, come on Google! Where’s the custom Google copyright search? If they spent as much time on it as they claimed they did, why not go the extra couple hours and auto-import this into a wiki/blog/cms somewhere so then Google can index it?

  • Kieran O’Neill

    #5: “why not go the extra couple hours and auto-import this into a wiki/blog/cms somewhere”

    You try that on a 200-300MB XML file, and see what comes out.

    Give them (or the FOSS community) time. (Or develop a solution yourself.)

    I would propose a lightweight, cross-platform desktop app, that you just point at the file.

  • ChuckEye

    Grep is your friend…
    So Harlan Ellison’s smut collection “Sex Gang” had its copyright registered 1959-09-08 and renewed 1987-12-11. When they say 28 years, do they mean to the date? ’cause if not, he was about 3 months late…

  • AlexanderT

    Check out http://www.incopyright.org/.

    It’s a database-driven full-text search engine using above Google records.

    Suggestions and feedback always appreciated.

    Cheers,
    Alex

  • Michael R. Bernstein

    While this is very nice, there is a rather large category of works in Google Book Search that despite being rather indisputably in the public domain are still not displayed in full. These works in fact *never* enjoy copyright protection, no matter when they were published (even last year).

    I’m referring to works authored by the U.S. Government. Try this search as an example:
    http://books.google.com/books?q=inauthor:%22United+States+Congress%22

    That search has 45k results across all works. If you look for ‘Full View only’, the results dwindle to about 5,700, leaving ~39k results that should be full view, but aren’t.

    And of course, there are other searches with similar (if generally fewer) results.

  • pcgorman

    Stanford University has had a search interface for the renewal records for some time now: http://collections.stanford.edu/copyrightrenewals/bin/page?forward=home

    They also used the Gutenberg-transcribed records. You want to be careful in searching this data, as the titles and names may not correspond exactly to what you see on the title page. I usually try a number of different searches before being satisfied something’s not in the data.

  • John Mark Ockerbloom

    A variety of links to various versions of these and other renewal records can be found at

    http://onlinebooks.library.upenn.edu/cce/

    along with some tips on how to use them.

  • AlfonsoElSabio

    @3, get a grip.

    Be grateful that the file is available at all …

  • fALk

    For a lack of better space I would like to thank the tireless folks at the mentioned institutions for making this possible. I never understand why the governments of this world make it so hard to access public knowledge. An open society where everyone just helps everyone without the exchange of money for the benefit of the greater good is the closest to garden eden I can imagine in the 21st century.