Google's forgetting the early web

XML pioneer and early blogger Tim Bray went looking through Google for some posts he knew about from 2006 and 2008 and found that Google couldn't retrieve either of them, not even if he searched for lengthy strings that were exact matches for text from the articles; he concluded that "from a busi­ness point of view, it's hard to make a case for Google in­dex­ing ev­ery­thing, no mat­ter how old and how obscure," and so we could not longer rely on "Google's glob­al in­fras­truc­ture as my own per­son­al search in­dex for my own per­son­al pub­li­ca­tion­s."

The good news is that Bing and Duckduckgo both maintain much more complete indices of old posts and publications, and so if you're looking for stuff that's more than a decade old, you can switch to one of Google's competitors to find it.

What Google cares about · It cares about giv­ing you great an­swers to the ques­tions that mat­ter to you right now. And I find that if I type in a ques­tion, even some­thing com­pli­cat­ed and ob­scure, Google of­ten sur­pris­es me with a time­ly, ac­cu­rate an­swer. They've never claimed to in­dex ev­ery word on ev­ery page. ¶

My men­tal mod­el of the Web is as a per­ma­nen­t, long-lived store of humanity's in­tel­lec­tu­al her­itage. For this to be use­ful, it needs to be in­dexed, just like a li­brary. Google ap­par­ent­ly doesn't share that view.

Google Memory Loss [Tim Bray]

(via 4 Short Links)

(Image: Home Depot)