Battelle on Yahoo search claims, Google reply

Xeni Jardin

5:02 pm Thu, Aug 11, 2005

Yahoo issued an announcement earlier this week in which they claimed to have indexed over 20 billion items. Over on Searchblog, Boing Boing "band manager" John Battelle posts:

[This] ruffled more than a few feathers across the web, and nowhere more distinctly than at Google. I spent an hour or so on the phone with a group of Google folks, and they shared a lot of information about how they measure index size, how they deal with issues of duplicate URLs and documents, and why they are baffled by Yahoo's claim. I am still reporting this story, so a longer post is forthcoming, but an update at the end of the day is worth penning.
First of all, I agreed to review some of the Google information on background, agreeing not to disclose it save with permission. (I agreed to this only if I could tell you all that I did in fact agree to it). I am still digesting what Google had to say, and the information they sent me, but it did leave a distinct set of questions percolating in my mind, questions that I plan to speak to Yahoo about (Yahoo has agreed to talk as well, we just haven't had time yet).
In any case, the lead really is this: I asked Google to go on the record with their concerns about Yahoo's index and whether they believed the news was in fact accurate, and Google agreed. The quote, which I can only attribute at this point to a "Google spokesperson," is as follows:
"Our scientists are not seeing the increase claimed in the Yahoo! index. The data we have doesn't support the 19.2 (billion page) claim and we're confused by that."

Details here. A response from Yahoo, and analysis on how their numbers were calculated, is said to be forthcoming in another post. But as JBat says, the larger point seems to be:

This calls for a benchmark/standard for measurement that might makes all of this moot.

In related news, and also on JBat's blog, the widely discussed Google/Meetro buyout rumor is thought to be false: Link.