More hard data on the impact of free/pirated downloads on book-sales

Brian F. O'Leary has posted slides updating his quantitative research on the effect of "piracy" and/or free giveaways on book-sales, done independently using data from O'Reilly and Random House (the largest tech publisher and general publisher in the world, respectively). The new slides, from the recent Book Expo America, expand the work with a larger data-set, and confirm the earlier findings that free downloads are broadly correlated with higher overall sales (though correlation is not causality!).
With a larger data set, we tried plotting the average paid sales of pirated and un-pirated content using a common starting point (that is, we plotted sales data week-by-week after publication). The results of the week-by-week and four-week rolling averages are shown on slides 28 and 29 of the BEA presentation. Both pirated and un-pirated titles showed similar growth in sales in the first few weeks after a title is published, followed by a decline after peak. Average sales for unpirated content start higher and peak later, although this may reflect the specific nature of titles in a small sample.

The primary difference between sales of pirated and unpirated content appeared in weeks 19 through 25, when sales for pirated content peaked a second time at a level higher than that seen in the first, sell-in period. This second peak followed the time (19 weeks) at which the average pirated O'Reilly front-list title was first seeded on a P2P site.

We stress that this is correlation, not causality, but the difference in the sales profile is notable and persists even when using rolling averages.

The impact of piracy

Impact of Piracy and Free on Book Sales (BEA 2009, Powerpoint) 2.0 MB


  1. Hot dog. I hope they repeat this on a larger sample. Correlation doesn’t imply causation but if every example correlates it certainly supports the idea of causation, and paves the way for further studies which do show causation.

  2. As with Microsoft software, for example, it’s in their best interest to allow a percentage of pirated copies of software to be used because it ups the number of people who are familiar with their software. And since everybody knows how to use Windows, businesses have it in their best interest to use Windows.

    The same is true with books: if the number of people recommending a book goes up, the number of people reading the book on their friends advice goes up. As long as only a percentage are pirating, you still have an overall increase in readership.

    This same principle applies to video games, movies, anime, tv shows, music, etc.

    This even applies to fashion. One of the reasons LV and Chanel are such fashion powerhouses is because it’s so easy to find fakes. So their popularity goes up, which means a larger percentage of people are willing to buy the real thing.

  3. everyone that tries to track lost sales for pirated things, books, movies, music always overlook one huge glaring error in their research. They’re not lost sales, chances are strong that the people downloading the books or movies would have never purchased what they’re pirating.

    What they should do instead is consider all the lost sales due to high prices, how many more people would be willing to buy a book that costs 2 dollars to print, if it was priced at 4 dollars out of the gate instead of 25 or 35.

  4. We will continue to monitor the data on an ongoing basis to establish a more complete profile. A download of the full research paper, which is published as a Rough Cut that includes access to any future updates, is now available for purchase ($99).

    Err.. $99? Anyone have a pirated version?

  5. Maybe people who are wont to giving their books away for free just make more appealing content. ;)

  6. As much as I like to find compelling statistics to illustrate that information wants to be free, these slides don’t cut it. Yes, they show the spikes after pirated copies are seeded, but looking past that initial spike, sales drop and continue to drop WAY below the average non-pirated work (slide 28) over the long term.

  7. Slide 28 is pretty damning. It seems the take away is if you want to sell your book for longer than a year and a half you should do everything in your power to stop pirated copies. That uptick after the first seed is pretty interesting, though. Maybe what we need to figure out is does that little boost compensate for the early death of the title as backlist. Since a publisher’s backlist (books older than a year) usually accounts for half of a traditional publisher’s revenue, I suspect it won’t. That may not be true for O’Reilly though as technological advances make many of their books irrelevant much more quickly than, say a cookbook, or a monograph on art history.

  8. I used to buy a lot of books. First new… paper backs only , then used. Now I cant afford to buy any books. No Money lost here. This artical is a bit slanted in as much as it does little or nothing to bring the state of the economy into it.

  9. In regards to slide 28- according to the study, they’ve only been monitoring titles since fall 2008. Slide 27 actually notes a disclaimer stating “the average time on sale for pirated content in this sample is shorter (35 weeks) than that for the un-pirated content (47 weeks). Comparisons at the end of the on-sale period are not reliable.”

    Sounds to me like the graph should have stopped at 35 weeks when they no longer had accurate data, and it wouldn’t have led to confusion.

Comments are closed.