Marketing companies frequently "anonymize" their dossiers on internet users using hashes of their email addresses -- rather than the email addresses themselves -- as identifiers in databases that are stored indefinitely, traded, sold, and leaked.
Read the rest
In An Empirical Analysis of Traceability in the Monero Blockchain, a group of eminent computer scientists analyze a longstanding privacy defect in the Monero cryptocurrency, and reveal a new, subtle flaw, both of which can be used to potentially reveal the details of transactions and identify their parties.
Read the rest
Even the most stringent privacy rules have massive loopholes: they all allow for free distribution of "de-identified" or "anonymized" data that is deemed to be harmless because it has been subjected to some process.
Read the rest
The "replay sessions" captured by surveillance-oriented "analytics" companies like Fullstory allow their customers -- "Walgreens, Zocdoc, Shopify, CareerBuilder, SeatGeek, Wix.com, Digital Ocean, DonorsChoose.org, and more" -- to watch everything you do when you're on their webpages -- every move of the mouse, every keystroke (even keystrokes you delete before submitting), and more, all attached to your real name, stored indefinitely, and shared widely with many, many "partners."
Read the rest
Ad-blockers begat ad-blocker-blockers, which begat ad-blocker-blocker-blockers, with no end in sight. Read the rest
Princeton computer science researchers Steven Englehardt and Arvind Narayanan (previously) have just published a new paper, Online tracking: A 1-million-site measurement and analysis, which documents the state of online tracking beyond mere cookies -- sneaky and often illegal techniques used to "fingerprint" your browsers and devices as you move from site to site, tracking you even when you explicitly demand not to be track and take countermeasures to prevent this. Read the rest
In Online tracking: A 1-million-site measurement and analysis, eminent Princeton security researchers Steven Englehardt and Arvind Narayanan document the use of device battery levels -- accessible both through mobile platform APIs and HTML5 calls -- to track and identify users who are blocking cookies and other methods of tracking. Read the rest
The Princeton Bitcoin Book by Arvind Narayanan, Joseph Bonneau, Edward Felten,
Andrew Miller and Steven Goldfeder is a free download -- it's over 300 pages and is intended for people "looking to truly understand how Bitcoin works at a technical level and have a basic familiarity with computer science and programming." Read the rest
Cory Doctorow summarizes the problem with the idea that sensitive personal information can be removed responsibly from big data: computer scientists are pretty sure that's impossible.
Social networking sites are Skinner boxes designed to train you to undervalue your privacy. Since all the compromising facts of your life add less than a dollar to the market-cap of the average social network, they all push to add more "sharing" by default, with the result that unless you devote your life to it, you're going to find your personal info shared ever-more-widely by G+, Facebook, Linkedin, and other "social" services.
Arvind Narayanan has proposed a solution to this problem: a two-part system through which privacy researchers publish a steady stream of updates about new privacy vulnerabilities introduced by the social networking companies (part one), and your computer sifts through these and presents you with a small subset of the alerts that pertain to you and your own network use. Read the rest
One of the most interesting technical presentations I attended in 2012 was the talk on "adversarial stylometry" given by a Drexel College research team at the 28C3 conference in Berlin. "Stylometry" is the practice of trying to ascribe authorship to an anonymous text by analyzing its writing style; "adversarial stylometry" is the practice of resisting stylometric de-anonymization by using software to remove distinctive characteristics and voice from a text.
Stanford's Arvind Narayanan describes a paper he co-authored on stylometry that has been accepted for the IEEE Symposium on Security and Privacy 2012. In On the Feasibility of Internet-Scale Author Identification (PDF) Narayanan and co-authors show that they can use stylometry to improve the reliability of de-anonymizing blog posts drawn from a large and diverse data-set, using a method that scales well. However, the experimental set was not "adversarial" -- that is, the authors took no countermeasures to disguise their authorship. It would be interesting to see how the approach described in the paper performs against texts that are deliberately anonymized, with and without computer assistance. The summary cites another paper by someone who found that even unaided efforts to disguise one's style makes stylometric analysis much less effective.
Read the rest
We made several innovations that allowed us to achieve the accuracy levels that we did. First, contrary to some previous authors who hypothesized that only relatively straightforward “lazy” classifiers work for this type of problem, we were able to avoid various pitfalls and use more high-powered machinery. Second, we developed new techniques for confidence estimation, including a measure very similar to “eccentricity” used in the Netflix paper.
Research by Carnegie Mellon professor Latanya Sweeney and other experts shows that an alarming number of seemingly innocuous, neutral, or "common" data points, can potentially identify an individual online. "Privacy law, mainly clinging to a traditional intuitive notion of identifiability, has largely not kept up with the technical reality," says the EFF's Seth Schoen:
A recent paper by Paul Ohm, "Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization", provides a thorough introduction and a useful perspective on this issue. Prof. Ohm's paper is important reading for anyone interested in personal privacy, because it shows how deanonymization results achieved by researchers like Latanya Sweeney and Arvind Narayanan seriously undermine traditional privacy assumptions. In particular, the binary distinction between "personally-identifiable information" and "non-personally-identifiable information" is increasingly difficult to sustain. Our intuition that certain information is "anonymous" is often wrong.
What information is "personally identifiable"? (EFF Deep Links) Read the rest