Copyright renewal records for US books finally online

A Google engineer has tracked down, munged and XMLified the copyright renewal notices for all the books the US Copyright Office knows about -- now there's a one-click way to discover if an old book is in the public domain (more or less) and who holds the copyright if it isn't.
For U.S. books published between 1923 and 1963, the rights holder needed to submit a form to the U.S. Copyright Office renewing the copyright 28 years after publication. In most cases, books that were never renewed are now in the public domain. Estimates of how many books were renewed vary, but everyone agrees that most books weren't renewed. If true, that means that the majority of U.S. books published between 1923 and 1963 are freely usable.

How do you find out whether a book was renewed? You have to check the U.S. Copyright Office records. Records from 1978 onward are online (see but not downloadable in bulk. The Copyright Office hasn't digitized their earlier records, but Carnegie Mellon scanned them as part of their Universal Library Project, and the tireless folks at Project Gutenberg and the Distributed Proofreaders painstakingly typed in every word.

Thanks to the efforts of Google software engineer Jarkko Hietaniemi, we've gathered the records from both sources, massaged them a bit for easier parsing, and combined them into a single XML file available for download here.

Link (Thanks, Frances!)


  1. We Distributed Proofreaders didn’t type in each word. It was scanned and OCRed.

    We did however proofread every word of the OCR twice then reformat it as a plain text document with consistent formatting for each entry so our bit is probably pretty close to what is in the original and could be converted to XML.

    If you want to help proofread the worlds literature then you take 10 minutes to proofread just one page at Do a page a day and after a year that’s like a whole book!

  2. Where’s the one click? All I found was a 50MB zip file. There’s a lot of clicks involved in reading this. Where’s the online version? They’re google, for crying out lout, they should have an online version. A 50MB zip file available for download is not the same thing as “records online”.

    Plus it crashed my browser when I tried to read the file.

  3. Lol. Cory did say “one click … (more or less)”.

    For now, I’d guess it’s more like “one small script in the language of your choice”, but it probably won’t be long before someone hacks together a generic solution for the less technical.

    wxPython, anyone?

  4. Now that I’ve actually looked at it, it’s like, come on Google! Where’s the custom Google copyright search? If they spent as much time on it as they claimed they did, why not go the extra couple hours and auto-import this into a wiki/blog/cms somewhere so then Google can index it?

  5. #5: “why not go the extra couple hours and auto-import this into a wiki/blog/cms somewhere”

    You try that on a 200-300MB XML file, and see what comes out.

    Give them (or the FOSS community) time. (Or develop a solution yourself.)

    I would propose a lightweight, cross-platform desktop app, that you just point at the file.

  6. Stanford University has had a search interface for the renewal records for some time now:

    They also used the Gutenberg-transcribed records. You want to be careful in searching this data, as the titles and names may not correspond exactly to what you see on the title page. I usually try a number of different searches before being satisfied something’s not in the data.

  7. For a lack of better space I would like to thank the tireless folks at the mentioned institutions for making this possible. I never understand why the governments of this world make it so hard to access public knowledge. An open society where everyone just helps everyone without the exchange of money for the benefit of the greater good is the closest to garden eden I can imagine in the 21st century.

  8. Grep is your friend…
    So Harlan Ellison’s smut collection “Sex Gang” had its copyright registered 1959-09-08 and renewed 1987-12-11. When they say 28 years, do they mean to the date? ’cause if not, he was about 3 months late…

  9. Chuckeye,

    No free Harlan here, I’m afraid. Rightsholders had to renew within the 28th year after publication, but lag at the Copyright Office often led to the renewal date occurring in the 29th year.

  10. While this is very nice, there is a rather large category of works in Google Book Search that despite being rather indisputably in the public domain are still not displayed in full. These works in fact *never* enjoy copyright protection, no matter when they were published (even last year).

    I’m referring to works authored by the U.S. Government. Try this search as an example:

    That search has 45k results across all works. If you look for ‘Full View only’, the results dwindle to about 5,700, leaving ~39k results that should be full view, but aren’t.

    And of course, there are other searches with similar (if generally fewer) results.

Comments are closed.