Netflix is about to commit a privacy Valdez with its customers' viewing data

Princeton's CU Boulder's Paul Ohm writes about Netflix's insane new plan to release millions of customers' personal information -- ZIP code, gender, year of birth -- as a sequel to its Netflix Challenge. Latanya Sweeney's famous study on de-anonymizing data has shown that date (not just year) of birth, gender and ZIP are sufficient to personally identify 87% of Americans. In other words, Netflix is about to put the behavioral data about viewing choices for millions of Americans into the public domain, despite its legal duty to keep this information private.
Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a "video tape service provider" (a broadly defined term) from revealing "personally identifiable information" about its customers. Aggrieved customers can sue providers under the VPPA and courts can order "not less than $2500" in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.

Additionally, the FTC might also decide to fine Netflix for violating its privacy policy as an unfair business practice.

Either a lawsuit under the VPPA or an FTC investigation would turn, in large part, on one sentence in Netflix's privacy policy: "We may also disclose and otherwise use, on an anonymous basis, movie ratings, consumption habits, commentary, reviews and other non-personal information about customers." If sued or investigated, Netflix will surely argue that its acts are immunized by the policy, because the data is disclosed "on an anonymous basis." While this argument might have carried the day in 2006, before Narayanan and Shmatikov conducted their study, the argument is much weaker in 2009, now that Netflix has many reasons to know better, including in part, my paper and the publicity surrounding it. A weak argument is made even weaker if Netflix includes the kind of data--ZIP code, age, and gender--that we have known for over a decade fails to anonymize.

