Web Trackers Paint a Fresh Picture of You
Glenn Fleishman on the tricky new user-tracking method that sees through privacy controls.
The paper also explains an updated approach in testing for certain kinds of nearly unkillable, persistent browser tracking ids known as "evercookies," and presents a fresh and large survey of the use of a variety of user- and browser-tracking techniques across popular Web sites.
The upshot? Not only are the techniques found in use years ago still employed widely on some of the Internet's highest-trafficked sites, but new methods and improvements have appeared, and there's still neither an effective way to block most of these tracking tricks nor effective regulation and enforcement to dissuade companies from pursuing them.
We can be tracked from page to page, session to session, and often site to site even when we tell companies not to and take every available measure to halt it. And it's not getting better. Sorry.
With great power, comes great responsive ability
More sophistication brings with it a predicted price related to privacy about which this latest paper reveals more. The greater the power and flexibility of any given option for data to be stored or created in a browser, the more likely it can be used to uniquely identify a browser, if not an individual. Browser makers typically remain neutral or minimize the privacy issues around new features that have the potential to push information into browsers or identify them uniquely. Even in cases where they are not, the technology may be too powerful to overcome being subverted for tracking.
It's a partial replacement for features found in Flash, and as implemented a significantly more powerful and standardized method than anything previously widely available without a plug-in or on more than one or two browser platforms. About 85% of users worldwide have browsers capable of rendering text onto a canvas.
Canvas joins Flash cookies, HTML5 Session Storage, ETags, and many other tools used to provide state to a medium designed originally to be stateless and storage for a medium designed to rely on servers—as well as other innocent bystander components that were just minding their business—as mechanisms to follow us around.
Follow the cookie monster's money
Tracking a browser or a user across browsers is most obviously done with logins and regular browser cookies. Visit a site or log in to it, and various identifying information is typically stashed in cookie storage in a browser. Whenever the browser requests a file of any kind from a server at a domain that matches the cookie's domain or domain wildcard, the browser sends it back.
But cookies can be easily blocked or, if required to work with a site, deleted: browsers often include primitive controls and privacy modes in which cookies are deleted when an incognito session ends; third-party software can selectively block known advertising sites and cookies or, on demand, crush cookies left behind.
This makes marketers terribly unhappy, because allegedly the more information they know about you over time, the more carefully they can target advertising, which allows their customers paying for ads to produce a better conversion rate into purchases of goods and services. Further, maintaining identity over time allows better understanding of the lifecycle of someone's decision making from seeing information about a thing to consummating a sale or action (such as signing up for and reading an email list) to cancelling or switching to a competitor.
That's all very well and good with our consent. A battle raging now for several years is whether or not browsers, tied in with the Do Not Track HTTP header extension, should assume a user wants to be tracked, does not want to be tracked, or has expressed no opinion at all—and whether and how the advertising and tracking industries should honor that preference when expressed as "no." Many ad networks and sites offer some way opt out a browser or an identity from specific tracking data being used, but most still collect the data and claim to anonymize it for aggregated metrics; this paper provides some statistics on this topic as well.
Persistent, hidden cookies subvert the entire area of discussion, however. Ashkan Soltani, a security and privacy researcher with a deep history of exposing tracking methods (and a one-time FTC employee), says that in regards to a regulatory or technical approach for controlling tracking of users, "the pushback was that consumers had choice and could always opt-out." But in practice that hasn't been the case. "There's been a number of studies (including a few I've done) demonstrating the inability for consumers to opt-out of tracking."
Soltani notes that the Internet's economics are heavily driven by specific metrics, such as publishers being paid based on unique visitors and impressions. "There's a huge incentive to make sure you're identifying (i.e. cookie-ing/fingerprinting) each individual user in order to have an accurate count (and subsequently accurate dollar amount)," he says.
This incentive has led to a combination of testing and deployment of ways to track users even when they use every single tool at their disposal to prevent such snooping. Sites and ad networks spread a tracking id and replicate it across every nook and cranny that they can find in a browser and plug-ins like Flash. In 2010, Samy Kamkar wrote demonstration code he dubbed "evercookie" that would stash values in every possible location and automatically "respawn" the cookie (retrieve it and push it back into the browser cookie stash) when it was deleted.
The evercookie has entered common parlance as a term because it showed us just how vulnerable browsers are to such tracking, and how nearly impossible it is, even four years later, to block persistent identity. And that's where canvas now comes in.
Draw me like one of your French URLs
Canvas drawing isn't pixel identical across every instance of every version of every browser on every platform. It can be close, but any two browsers seemingly produce a slightly different outcome. As the paper's authors put it, "The same text can be rendered in different ways on different computers depending on the operating system, font library, graphics card, graphics driver and the browser."
AddThis did not respond to a request for comment, but told ProPublica that its script was part of ongoing research, used on a subset of sites on which it's deployed, and didn't provide results that weren't "uniquely identifying enough" to rely on. AddThis also said that it doesn't ask permission from web sites to deploy such tests and that it doesn't use the data collected at government sites for "ad targeted or personalization," but didn't disclaim such use on other sites.
Less widely mentioned in the coverage of this paper are the extent to which the researchers surveyed and tested sites' use of not just canvas but evercookies, as well as automated testing of respawning. They also found the use of a new vector (the IndexedDB storage option for browsers) and some methods of respawning that they haven't yet been able to determine.
Typically, respawning tests involving Flash have been limited because hand checking of results was required. However, the authors built tools to monitor Flash cookies used to respawn browser cookies (tested against 10,000 sites) and vice versa (3,000 sites). Flash cookies are particularly insidious because they can be spread to any browser that has access to Flash on the same device. Some of the top-ranked Alexa sites make use of Flash respawning; of the top ten such by their rank, nine are registered in China (one in Hong Kong) and one in Russia.
The researchers also looked at cookie synchronization, in which the same identifier is used in a tracking ID across multiple sites. They examined Alexa's top 3,000 sites and estimate based on various options and tracking that 40% (with third-party cookie use disabled) or 50% (with it enabled) of a user's browsing history across those sites have the potential to be reconstructed by backend database analysis. Sites that engage in cookie sync include DoubleClick and Amazon's CloudFront, according to the report.
The worst news from the report is how relatively prevalent these insidious techniques have been deployed and also how they have been extended. There is little legitimate purpose for almost all of these respawning and tracking methods beyond subverting the intent of users. If the intent were legitimate, they would be less persistent and less aggressive with more disclosure when they respawned.
The paper notes, however, that "tracking vectors such as localStorage, IndexedDB and canvas cannot be disabled, often due to the fact that doing so would break core functionality."
Thus one has to choose between the full-featured web of today and a more limited version. And, even when a user makes every possible choice and engages every mechanism, the tools may be inadequate to prevent all tracking, and new exploits could pop-up tomorrow that current knowledge and third-party software can't address.
Soltani is convinced after his time at the FTC (alongside Chris Soghoian, now at the ACLU, one of the Do Not Track concept's instigators) that "consumers will ultimately lose an arms race that's technology based." He worked to create a policy approach that would override any technological innovation, but "this has been mired in DC lobbying and not really progressed to be an effective mechanism."
Yet Soltani hasn't given up yet. He says, "A strong public debate is necessary and each subsequent news story or academic article on this topic will help inform that debate." Keeping the pressure up on companies that are found to engage in subterfuge for tracking purposes can help as well, but there is no consistent surveying done by any privacy-advocacy parties.
Soltani notes, "For every one or two academic publications, there are dozens of new methods/techniques to monetize users' Internet activity." Perhaps that's an area in which to place hope and support new research: automated and regular examination of tracking techniques would cast more light on the practices as they develop and expand.
At The Malware Musuem you can enjoy the experience of DOS-era viruses, trojans and other digital beasties without any of the risk. Many of them manifested as wild graphical tricks and other spectacular coding feats, distracting you as they formatted hard drives or corrupted files. The Malware Museum is a collection of malware programs, usually […]
Neglected public payphones in New York City are being turned into “GuyFi” stations: a place where one can rub one out for the sake of “stress relief.” Annalee Newitz reports on the wank booths from a company named “Hot Octopus”… The company reported that at least 100 men used the booth on its opening day […]
You’d be forgiven for thinking the videocassette format long-dead, but it turns out that Betamax is still around. Sony is finally going to withdraw tapes from sale, bringing a 40-year story to an end. The last recorders were sold in 2002. ベータビデオカセットおよびマイクロMVカセットテープ出荷終了のお知らせ [Sony; via The Verge]
Light used to just be one of two things: on or off. Simple as that. Either a flood of yellow or total darkness. Then the dimmer switch happened and you could adjust the brightness to meet your seductive needs and suddenly everyone looked a little better in the gentler light. And now your luminary universe […]
Projects will always need management. And now with the tech gold rush it feels like there are more projects than ever with fewer managers than there’s demand for. But it takes too much time and money to go back to school full time so luckily the Project Management Professional certification training course is now 96% […]
If you’ve been blessed enough to avoid them yourself, you’ve definitely heard the horror stories. Late night, crushing out a ton of work, writing, coding, anything, then boom – your computer crashes. The battery blows, you spill water or coffee all over the place, or it just shuts down with no explanation, and you’re screwed. […]