WhatTheInternetKnowsAboutYou: your browser is giving away your history

Art sez,

We just launched a new Web-privacy-related webapp, and want to show it off to you.

The app is an example of using browser history detection to determine personal preferences of Web browser users and is located at http://whattheinternetknowsaboutyou.com. The history detection hack has been known for quite a while; it works by using the CSS :visited pseudoclass to style visited links differently from unvisited ones, in order to figure out which ones are present in the browser's history and does not require JavaScript.

There are over 20 tests to extract various kinds of information from the browser's history; the most obvious application is to check for visits to the most popular websites and blogs, which we grouped into categories (banks, pr0n sites, dating sites, social networks, etc.) We're also monitoring for more sensitive content, such as all visited Wikileaks articles and administrative pages, visited .gov and .mil websites, as well as Google search queries and zipcodes typed into forms. In addition to that, we're indexing over fifty most popular RSS newsfeeds (including Boing Boing, of course) to determine which recent news stories the user has read; also, for social news sites, we're trying to determine the user's username by detecting visited profile pages.

We also meticulously documented the problem and listed possible solutions in hope of educating casual Web users as well as browser vendors about this issue. Most people still have no idea that such history detection is possible, and in fact trivially easy to implement; what's worse, there are no simple ways to protect against this (other than disabling history altogether). I hope that by publicizing the issue we can get browser vendors to figure out sane ways of solving the problem to make our browsing histories private again, and would appreciate your help.

What the Internet knows about you

(Thanks, Art!)