Why Publishing Should Send Fruit-Baskets to Google

Image courtesy of Metin Seven

Google's new Book Search promises to save writers' and publishers' asses by putting their books into the index of works that are visible to searchers who get all their information from the Internet. In response, publishers and writers are suing Google, claiming that this ass-saving is in fact a copyright violation. When you look a little closer, though, you see that the writer/publisher objections to Google amount to nothing more than rent-seeking: an attempt to use legal threats to milk Google for some of the money it will make by providing this vital service to us ink-stained scribblers.

Opponents of Google Book Search (GBS) argue that publishers should have been consulted before their works were scanned, but it's in the nature of fair use that it does not require permission — that's what a fair use is, a use you make without permission.

They argue that GBS should pay some money to publishers because anyone who makes money off a book should kick some back — but no one comes after carpenters for a slice of bookshelf revenue. Ford doesn't get money from Nokia every time they sell a cigarette-lighter phone-charger. The mere fact of making money isn't enough to warrant owing something to the company that made the product you're improving.

Here's how GBS works: Google works with libraries to scan in millions of books, most (more than 75 percent) of them out-of-print, some out-of-copyright and some in-print/in-copyright. Google scans these books, converts the scanned images of the pages into text, and indexes the text.

This index will be exposed to the public, who will be able to search the full text of tens of millions of books — eventually this index could comprise the majority of books ever published — and get results back reporting on which books contain their search-terms.

For public domain books, the search-results will contain a link to the whole text of the book. These out-of-copyright works are our collective human property — or no one's property at all — and Google is perfectly within its rights to distribute copies of any public-domain book that matches a search-request. As an author, I would love to be able to get the full-text of books that matched my search-queries.

For other books — the books that are in copyright — Google will show a brief excerpt: a single sentence with one or two sentences from either side of the the match. In some cases, publishers or other copyright holders have granted Google permission to show more than this — a couple pages — and Google will show you this, too.

In all cases, Google provides information for buying any book that matches a search-query, provided that the book is in-print. Sadly, most books aren't in print, and for an author, there is no greater professional loss than that arising from not having your works available for sale at all — this loss far outstrips any conceivable loss from kids with photocopiers, Russian hackers who post ebooks on their websites, or fumble-fingered marketing or PR.

So what's not to like? Writers and publishers have fielded many objections to GBS, flinging a lot of muck in the hopes that some of it will stick. The three objections that have emerged as the main talking-points for GBS's opponents are:

  1. Google should cut copyright holders in for a slice of any revenue that comes from this: if Google can turn a profit on our books, why shouldn't we?
  2. Google should have obtained permission before scanning the GBS books; copyright controls the making of copies, and Google had to make a copy produce its index.

  3. It will be too easy to spoof: Although Google only shows excerpts, wily hackers could eventually piece together enough excerpts to reproduce the entire GBS library and then post it on the Internet, at which point all bets will be off.

But these objections reflect a nonsensical vision of how copyright law and computer security work. The reality is that the biggest threat to book-writers and publishers is that their works are simply invisible to people who get all their information from the Internet. Google Book Search makes our books visible to those people. In so doing, Google will save our asses from oblivion. Instead of sending legal threats to Google, I think that writers and publishers should be sending them fruit-baskets and thank-you notes.


More than 75 percent of the books in Google's index are not in print. A substantial portion of those books have disputed, unclear or missing rightsholders. In many instances — the majority of instances, if my own experiences in getting "clearance" for the copyrights in out-of-print books is anything to go on — Google won't be able to contact these rightsholders in anything like a cost-effective manner. The majority of works in the world's libraries would not be scanned, would not show up in Internet searches, and would cease to matter to our cultural discourse. They will have been effectively suppressed.

It gets worse: every twenty years or so, the entertainment industry manages to secure an extra twenty years' worth of copyright for everything ever made. That means that these works have every chance in the world of staying in copyright for something like forever, even though they have no visible rightsholder, even though their copyright status keeps them from being rescued from the scrapheap of history, even though suppressing an author's work is far, far worse than merely infringing her copyrights.

Imagine, though, that it was possible to cost-effectively contact all the parents of those orphan works? Should Google have to pay?

Fair Use and Google Book Search

Google scanned the GBS library without securing any copyright-holder's permission. They got permission from the libraries whose books they scanned, of course — and they got publishers' permission to display full-page excerpts in their search-results. But Google made its initial scans under a US legal doctrine called "fair use."

Normally, copyright holders have a monopoly over the copying, display and performance of the works they create or acquire. Fair use is a category of uses that can be made without permission from or payment to rightsholders. These uses are ones that serve the public interest by preventing the author's monopoly from creating market failures, from stifling free speech, and from compromising the property interests of the people who acquire copies of copyrighted works.

So what's a fair use? Is there a certain number of words you're allowed to copy to make a use fair? Are all noncommercial uses fair? Are all commercial uses unfair? Is there a list of which uses are fair?

Judges consider a number of factors in determining whether a use is fair. A large part of fair use analysis hinges on the four factors — a collection of four criteria from 17USC, the US Copyright Act, which guide judges' decision-making. But more important than the four factors is commonsense. The four factors are a floor on the public's rights in copyright, not a ceiling — they're the minimum criteria that signal the fairness of a use. Here they are:

  1. the purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes;
  2. the nature of the copyrighted work;
  3. amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

  4. the effect of the use upon the potential market for or value of the copyrighted work.

But there are lots of uses that are fair even though they fail the four factors test. The most famous of these is "time-shifting" with a VCR. In 1984, the Supreme Court ruled in Sony Corp. of America v. Universal City Studios, Inc., the case that established the legality of Sony's Betamax video-cassette recorder. Sony had introduced the VCR as a device for "time-shifting" shows that were on when you weren't home. Today, it's obvious that time-shifting is innocuous and ultimately to the benefit of the entertainment industry, but this was hardly clear in 1984 — indeed, many legal scholars of the day felt that Sony's defense was doomed; the entertainment industry warned that if Sony prevailed it would be their death-knell. Jack Valenti testified to Congress in 1982 that "the VCR is to the American film industry as the Boston Strangler is to a woman home alone."

Why was a use like time-shifting so legally difficult to defend? Well it fails three of the four factors:

  1. It consumes the whole work, not an excerpt or quote
  2. It can copy works that are creative in nature — not just news-casts, but also feature films

  3. It makes no "transformation" of the work — it doesn't turn it into a parody or criticism

(Arguably it failed on the fourth test as it harmed Hollywood's ability to offer exclusive home-viewing licensing to competitors of the VCR, like the Discovision, an early play-only home theater device, but the Supremes had different ideas about this)

If the four factors were all the Supremes considered in the course of their deliberations, the VCR would have been banned on the spot. But, thankfully, judges don't stop at the four factors: in the words of the Pirates of the Caribbean's ghost-captain: "They're more what you call guidelines." Where a use fails the four factors but wins on commonsense, judges can rule on that basis.

If Google's scanning of the books for GBS isn't fair, then it indeed needed permission from publishers and/or authors to compile its library. But if the use is fair, then by definition, it doesn't need permission: fair uses are those uses that don't require permission.

Let's examine each step of GBS to see if it seems unlikely to be fair:

  • Displaying ads alongside of search-results
    Showing ads alongside of excerpts isn't necessarily an infringement. Book-critics often quote the books they're reviewing (even books they aren't reviewing!) in the pages of magazines; these pages frequently contain advertisements. If quoting on a page with ads is an infringement, then the New York Review of Books is in big trouble.

  • Showing quotes in response to searches
    If you write a letter to the editor of your local paper asking exactly how William Gibson's Neuromancer opens and they publish a reply containing the infamous line, "The sky above the port was the color of television, tuned to a dead channel" it's pretty clear no infringement has taken place. Indeed, running short excerpts as necessary to make a point in reportage, criticism, analysis, or parody is canonically a fair use.

  • Scanning in books
    Here's where it gets interesting. In order for Google to figure out which fair use quotation to show you, it must first make a copy of the whole book — several copies, in all likelihood. Here we have the four factors on our side. While Google copies the entirety of the work, and while the works are often of a creative nature, Google is only distributing the briefest of quotations, and it can hardly be said that Google is disrupting the normal fortunes of authors and publishers in providing a searchable index of their books.

There's no substantial business today in charging companies money for the privilege of indexing one's book; indeed it's often the reverse: a publisher or author pays a service to produce an index of its books.

Google has done what it does best: converted something that used to cost money into something that makes money. For example, older search companies spent a lot of money on human editorship of their indexes, and then charged money for access to the results. Today, Google uses the links that web-writers create between web-pages to figure out which pages are about what subject and how important they are — then Google charges advertisers to show results alongside of the search-results.

The argument against the fairness of the initial scanning hinges on this: because Google has demonstrated that there's gold in them thar indexes, it supposedly follows they they should share the wealth.

But this is a false line of reasoning. If adding value to someone else's creation entitled him to a chance to say no, then anyone who makes an iPod case, an automobile cup-holder phone-cradle, or a lens-wipe for a camera should have gotten permission from the creators of the technologies they're improving. Hell, every carpenter who ever put together a bookcase owes her livelihood to the books that got shelved on them — why not go after them, too?

This is the real meat of the argument: rent-seeking. Wikipedia's compact definition of the term is this: "[Rent-seeking] takes place when an entity seeks to extract uncompensated value from others by manipulation of the economic environment." Rent-seekers are shakedown artists: they don't add new value, but they demand a piece of the action anyway.

There are plenty of ways that publishers could turn a buck off of indexing their works — they could index them themselves; they could sell premium access to digital versions of their catalogs to Google or its competitors, they could come up with ways of executing searchable indexes that are better than those that Google delivers.

It's also clear that publishers will benefit from the increased visibility of their works: the more people hear of a book, the more copies of that book will sell. Putting books into search-resultes increases the number of people who'll hear of them.

Google versus the scrapers

But what if readers use the quotes Google sends them to piece together the whole book?

Writers and publishers have written that Google presents a risk to them because wily hackers will be able to use multiple searches from different addresses to extract all the text of all the books in Google's index. Once this is done, they argue, the books will appear all over the Internet and that'll be the end of publishing: after all, who will buy a book when the electronic text of it is available for free?

This argument is technologically and commercially nonsensical.

Google has an army of computer scientists who continuously monitor and fine-tune its intrusion-detection system. It has to, because Google lives in a highly competitive, high-stakes marketplace (far more competitive and high-stakes than publishing), and has shown itself to be more than capable of detecting and shutting down attempts to "scrape" significant portions of its database, even when those originate from a wide range of Internet addresses. It's simply not credible to believe that Google could miss the fact that some kids are running the billions of queries necessary to extract its GBS library.

More to the point, though: If all it takes to kill publishing is a low-cost means of acquiring digital copies of books, then publishing is dead already. It's cheap and easy to turn a book into a text-file at home, and it gets cheaper with every second. Why should we give credence to the hypothetical risk that a well-resourced gang of book-thieves will spend millions of dollars and hours spoofing Google, but not spend those same hours and dollars simply scanning in books? Google has an army of PhD computer scientists guarding its database; no such army protects the stock of your corner used-book store.

Finally, it's no foregone conclusion that free electronic copies of a book will substitute for sales of physical copies of that book. My first novel, Down and Out in the Magic Kingdom, was released as a free, open download on the same day that it appeared in stores. Three years later, it's in its sixth printing and more than 650,000 copies of it have been distributed from my website (an untold and unknowable number of copies have been distributed by others, as well). That's because my biggest threat as an author isn't piracy, it's obscurity. The majority of ideal readers who fail to buy my book will do so because they never heard of it, not because someone gave them a free electronic copy.

Tim O'Reilly, the publisher of O'Reilly and Associates, framed the piracy-vs-obscurity question, and he also gave us its corollary: "Piracy is progressive taxation." That is to say, the most widely pirated O'Reilly books on the Internet are also the most profitable ones. Most writers can only dream of achieving enough market-share to warrant anyone's effort to pirate their works — indeed, one of the few things that gives me hope for science fiction as a genre is that it's the only kind of fiction that Internet users can be bothered to pirate in any great quantity.

Some day, electronic texts will substitute for print books: the convergence of superior technology and an audience raised to read off-screen will make treeware editions into luxury items and white elephants, the way that oil-paintings are today. It's certain to me that books will be largely represented as bits in the near future. It's likewise certain that bits will never, ever get any harder to copy than they are today. From here on in, barring nuclear holocaust, bits will only get cheaper and easier to copy, period. Anyone who thinks bits will get harder to copy is either not paying attention or kidding himself or kidding you.

Smart authors, then, should make some hay while the sun shines — that is, use free ebooks to sell print books. That will make authors rich today. To ensure that authors stay rich tomorrow, though, we need prepare to change over to the new models that emerge when books are most often freely copyable digital objects. The best way to do that is to perform millions of experiments with digital texts to see which approaches are likely to bear fruit.

Will authors have to turn into performers? Maybe — after all, performers once had to turn into studio-musicians when phonogram and radio technology disrupted their business-model. Will authors have to ask for tips? Publish in free, advertising-supported venues? It's likely to be a combination of these things and others; after all, books and authors are distinctive and so their business-models will be too. One thing is true today, though: the more electronic editions of your books circulate, the more books you sell.

In an ideal world, writers could choose whether they wanted to avail themselves of this opportunity to sell books by giving away digital copies, but in an ideal world, authors wouldn't have to trouble themselves with any of this stuff — they could just sit at home and write.

The realpolitik of authorship is that authors can't master their digital destiny when it comes to fans who share their works. Authors can choose to chastise and sue their fans for electronically evangelizing their works, but any victory gained by suing your customers is a hollow one. No sustainable business-model starts by insulting or suing the customers who love you best.

Google Book Search won't have any impact on "ebook piracy," one way or another. What it will do is make is easier for readers to find out about books and buy them.


This all comes down to obscurity versus visibility. There was a time when there was a giant market for books as social tools — read the right book and find people who shared your values, whether that was the guy on the subway with the Dungeon Master's Guide, your hippie co-worker with The Celestine Prophecy, or the latest smartypants volume lauded in the pages of the Times Literary Supplement.

Less and less so every day, though. If there's one thing the Internet is good at, it's connecting people with comparable interests: if you're a Civil War re-creator with a penchant for extreme knitting and left-of-center liberal political beliefs, you can be sure that somewhere on the net there's a group of people waiting to welcome you in. These days, science fiction fans can find all the camaraderie and fellow-feeling of sf without having to do all that tedious reading — that's why at a con I attended a couple years ago, the two big-name authors on the ticket drew six people, while the guys who made the hilarious video-game-based cartoon Red Vs Blue had a full house.

It was once true that reading was a good way to get some light entertainment — whether you were stuck on a train or in your living-room, a lightweight novel was just the thing to tick the hours away. But here again, the Internet, video-games and the mobile phone are hugely disruptive. Any overland commuter train has is dominated by phone-conversations, with readers in an ever-dwindling minority.

It's easy to see why: content isn't king; conversation is. If you had the choice of bringing your friends or your books to a desert island, we'd call you a sociopath if you took the books over the breathing humans.

Between vegetative media like TV that leaves your hands free to eat and IM and knit and cook dinner and conversational media like IM and multiplayer games and phones, books are a big loser in the field of providing empty entertainment in the dull moments.

This pincer movement is gradually squeezing books out of the lives of much of the traditional audience for books: people don't need books to meet each other anymore, and books aren't the best way to kill time anymore.

If that wasn't bad enough, the number of retail outlets for books has also dwindled away. Mall and main-street bookstores have all but vanished; drug-stores and grocery stories have eliminated or downsized their book sections. What that means is that the only time you come across a book these days is when you go looking for one: when you specifically plan a trip to a big-box bookseller or a distant specialty store. That's fine: people who are already interested in buying books can go to a giant Borders or login to Amazon and get more selection than every before.

But the majority of potential customers for books will never plan a trip to bookstore. They're impulse buyers who happen upon an intriguing book in the course of their daily lives and wind up taking it home. For these people — people who might be willing to substitute a book for a phone or a game or a TV show or a convention or a newsgroup or a mailing list — for these people, books simply never cross their transom. The idea of buying a book just doesn't crop up anymore.

This is the single biggest threat facing publishing and writers today. Social media and increased entertainment choices compete for our readers' attention.

But this is also publishers' and writers' biggest opportunity. The Internet makes it possible for the social factors that sell books — the sense of community engendered by shared cultural referents, the conversation that books enable — to flourish. It may be that books aren't outcompeted by the Internet at all — it may be that Internet media are the lifeline that books need to survive in a world where the retail ecosystem of booksales has been denuded to stubble and mud.

That's where Google Book Search comes in. GBS puts books on a near-equal footing with other information resources, the ones that are currently kicking the hell out of us. When a customer performs a Google search, she can get results, right there on her screen, from real, actual books, books that can often be purchased with a single click.

This is our single best hope for extending our industry's lifespan for a decade or two. Physical books will always suffer the disadvantage of forcing a reader to actually make rendezvous with a lump of atoms that is like as not thousands of kilometers from her at the moment that she wants to refer to them. But with GBS, ebooks and fast fulfillment from etailers, at least books will maintain their position in readers' attention, and capture people who don't set out to find a book.

At least books will be part of the discourse.

We need to stop telling people that the Internet isn't as good as books. It makes us look like whiny jerks. We need to stop telling people that they have a moral duty to read. It makes us look like imperious jerks.

We need to act like a money-making industry and spare some attention for what our customers demand: books that are no more clicks away than web-pages.

GBS and programs like it are the best effort to date of solving that. I'm sending them my fruit-basket today. How about you?

(Illustration courtesy of Metin Sevin)