Promise and peril of data-scraping

Josh McHugh's Wired feature, "Should Web Giants Let Startups Use the Information They Have About You?," is a meaty, thinky piece about the many risks of data-scraping. The piece investigates the risks to users (your data, slithering around the net), the risks to scrapers (your business entirely dependent on someone else's goodwill), and the risks to scrapees (bandwidth clobbering, your users get screwed and so on):

Giants like Yahoo and Google have thus far taken a mostly nonproprietary stance toward their data, typically letting outside developers access it in an attempt to curry favor with them and foster increased inbound Web traffic. Most of the largest Web companies position themselves as benign, bountiful data gardens, supplying the environment and raw materials to build inspired new products. After all, Google itself, that harbinger of the Web2.0 era, thrives on info that could be said to "belong" to others — the links, keywords, and metadata that reside on other Web sites and that Google harvests and repositions into search results.

But beneath all the kumbayas, there's an awkward dance going on, an unregulated give-and-take of information for which the rules are still being worked out. And in many cases, some of the big guys that have been the source of that data are finding they can't — or simply don't want to — allow everyone to access their information, Web2.0 dogma be damned. The result: a generation of businesses that depend upon the continued good graces of a relatively small group of Internet powerhouses that philosophically agree information should be free — until suddenly it isn't.

Link