Google's forgetting the early web

Cory Doctorow

10:34 am Tue, Jan 16, 2018

XML pioneer and early blogger Tim Bray went looking through Google for some posts he knew about from 2006 and 2008 and found that Google couldn't retrieve either of them, not even if he searched for lengthy strings that were exact matches for text from the articles; he concluded that "from a business point of view, it's hard to make a case for Google indexing everything, no matter how old and how obscure," and so we could not longer rely on "Google's global infrastructure as my own personal search index for my own personal publications."

The good news is that Bing and Duckduckgo both maintain much more complete indices of old posts and publications, and so if you're looking for stuff that's more than a decade old, you can switch to one of Google's competitors to find it.

What Google cares about · It cares about giving you great answers to the questions that matter to you right now. And I find that if I type in a question, even something complicated and obscure, Google often surprises me with a timely, accurate answer. They've never claimed to index every word on every page. ¶
My mental model of the Web is as a permanent, long-lived store of humanity's intellectual heritage. For this to be useful, it needs to be indexed, just like a library. Google apparently doesn't share that view.

Google Memory Loss [Tim Bray]

(via 4 Short Links)

(Image: Home Depot)