Internet Archive and Yahoo announce open scanned-in-book index

Man, the Internet Archive just keeps on knockin' 'em out of the park. They've just announced a deal with Yahoo and a bunch of universities to go one better than Google Print: they're scanning and making available zillions of Public Domain and in-copyright books, under a license that lets rival search engines index and make available their full text. As Brewster sez in this NYT article, ""Other projects talk about snippets. We don't talk about snippets. We talk about books."

Although the new project will not be a direct source of revenue of Yahoo, it could give the company's search feature more visibility. The announcement also establishes a new round in the battle between Yahoo and Google over index size – the number of documents that can be found in a search engine's database.

Yet the new project's approach differs from Google's in several ways. Once a book has been digitized, Yahoo will integrate the content into its index and provide an engine for the group's Web site (opencontentalliance.org). "As soon as it's made available on the O.C.A. Web site, we'll get a feed letting us know, so it can be indexed by us immediately," said David Mandelbrot, vice president of search content at Yahoo.

In a departure from Google's approach, the Open Content Alliance will also make the books accessible to any search engine, including Google's. (Under Google's program, a digitized book would show up only through a Google search.) And by focusing at first on works that are in the public domain – such as thousands of volumes of early American fiction – the group is sidestepping the tricky question of copyright violation.

Link

(Thanks, Brewster!)