Netflix is about to commit a privacy Valdez with its customers' viewing data

Discuss

43 Responses to “Netflix is about to commit a privacy Valdez with its customers' viewing data”

  1. frimdaddy says:

    While Netflix prepares to publicize, those with cash have had access to this type of info from consolidators for years. All this hand-wringing while we ignore the silent ravaging of privacy that goes on daily.

    My advice is to give them CRAP:

    >>>>> ALWAYS LIE ONLINE <<<<<

    I simply add or subtract from my actual info in a consistent way. month -1 day -2 year -3, and so on. They can “identify” me but nothing that they collect is useful.

    -G

  2. Anonymous says:

    No one has mentioned that NF has your credit card number and the information they needed to verify same.

    They store on their servers a lot more than your zip and whether you checked M or F.

    Add to the above your IP which I am sure is stored by NF. If you are not sure what this means, read Paul Ohm’s post #19 above that links to his paper. Pages 60-64.

    If NF decides it needs some money, it has a lot of data to sell. I believe a Privacy Policy can be amended at any time. Didn’t Amazon do this 4 or 5 years ago?

  3. danlalan says:

    Online data mining allows faceless people to accumulate data about you for unknown purposes, and not only do you have no recourse if it becomes a problem for you, you may never even know that it has.

    Having someone know my taste in movies by itself is hardly worrisome, but it is likely to be combined with information gleaned from other such data mining operations and has the potential to give the data miners a pretty comprehensive picture of me, all gathered without my consent or knowledge.

    Granted, most personal data is hardly exciting stuff, but that really isn’t the point. If the government was wiretapping me they would die of boredom and I would have no worries about any legal problems, but the fact that they were doing it would cause me grave concerns.

    By the same token, even though the data that some faceless corporation is gathering may be banal, uninteresting and cause me no injury, I really object to having anyone looking into my personal habits.

    It isn’t the data about me that they are collecting that causes me concern, it is the fact that they are collecting it. My personal habits are not their business.

  4. Moriarty says:

    They would prevent the outrage if it was opt-in only.

    That’s exactly what they should do: ask nicely. Then it would just be responding to a survey.

  5. AirPillo says:

    I’m a Netflix customer, and not all of that data is asked for by them. It is optional, but can be witheld.

    Netflix does not know your age, for example, unless you opt to tell them, and if you have kept that private, then it will remain private. I do believe gender is exactly the same. It’s a field you can fill in if you want but aren’t made to. Given that, Netflix would have no info other than your zip code and history to disclose, which is logically and mathematically impossible to identify an individual with.

    What that means is that disclosing this info would only, at worst, affect the people who decided to disclose that information anyway. If people are given an opt-out option on top of this, then anyone who desired privacy would have it.

    Depending on how Netflix caches this data, a user protecting their own privacy may be as simple as opening their profile and removing the optional information.

    This might be important to some people, but is not as terrible as it sounds because users have been given a chance to keep that data anonymous even to Netflix in the first place.

  6. Anonymous says:

    Just hash the zipcodes? I assume that the actual geographic location of subscribers isn’t as important as the ability to tell them apart. So push all the zipcodes through a one-way function and you get the same ability to play with the data but much less risk of privacy violations.

  7. Anonymous says:

    Danlalan, your comments are very close to what I was going to respond to Nox’s quote:

    “We surrender privacy to live in society. In the small town I grew up in, the video store clerk and other customers knew exactly what type of movies I watched and who I was.”

    This is exactly the issue — not faceless, not stored in a beyond size database by a corp who’s actions you have no control over.

    These corps are your servant. You are paying them for a service or product. The information flow should be going in the other direction. But, the ‘gold’ you provide will be used.

    This is interesting timing for NetFlix to have this contest, just after the Census identified every front door in the US with GPS co-ordinates. TMI.

  8. danlalan says:

    Thanks for the heads up.

    As they can only release the information I tell them, I’ll let them release information that is more imaginative than accurate.

    I wonder what the effect on data miners would be if everyone creatively altered the information provided.

  9. fnc says:

    If the men in black came and hauled my wife away for watching Gray’s Anatomy, I would totally understand.

    But otherwise, shame on Netflix. ANY service that thinks it might ever disclose even a sniff of a user’s info should make the disclosure itself completely optional, rather than just what parts of your data you want the world to see. If you don’t check that box, nothing you ever do with them ever leaves their servers. Yeah I know, I wish I had a sack of money too.

  10. Anonymous says:

    The thing about privacy and security is that its not about what WE come up with in terms of ways this information can be used against us, its about what the bad guys come up with, what someone else comes up with.

    Thinking about information from a position of wanting to exploit it is a lot different than how we usually relate to it, which is neutrally or not at all. How many online forms have you filled out, giving personal information? We don’t really think about those forms ever again. How many profiles on social networking sites have you abandoned, how many are still out there?

    Not everyone- in fact i’d argue the majority of people on the internet are not very saavy about privacy and security.
    Some folks might not fill out their DOB or gender when they sign up but I would think the majority does.

    Its definitely not something that should keep you up at night but I think its good to be aware.

  11. Rindan says:

    To tell if you are in danger if a someone theoretically finding out what movies you like, do the following. Look around your zip code. Do you see another person of the same age and gender? Fear not, Netflix has not let your privacy data go. If you live in a zip code where you are the only person of your age and gender, grab your tin foil hat. Someone who wants to sift through government records to find your age, gender, and current zip code could, in theory, find what type of movies you like. Oh the horror. I bet your employer is finding out that you gave Orgazmo 5 stars right now (great movie BTW)!

    The paranoia level that you need to be operating at to worry about this is off the scale. You not only need to be a rare individual who lives in a very tiny zip code and be of an unusual age (likely 85+), but you need to be paranoid enough to think that someone will 1) go through the non-trivial effort to peg your information to you and then 2) use it against you. If you really fear this, you are one crazy tin foil hat old man.

  12. bardfinn says:

    Explicit laws beat EULAs and policies every time.

  13. Anonymous says:

    HaHaHaHaHaHaHaHaHaHa. You protect your privacy by protecting your anonymity. Netflix uses plastic money and/or checks. People who submit to photo ID, use credit cards and/or have checking accounts are just making lip service when they talk about privacy. Who are you protecting yourself from if not corporations and government entities? None of my friends who use netflix will complain in my presence because they know I told them so – for thirty years! I know people who still refuse to accept checks and work on a strictly cash economy. So quit your BS lip service about privacy and shut up! You wiped your ass with your privacy and flushed it down the toilet.

  14. faceword says:

    There is the potential for real harm from the release of this information.

    Movie history will almost certainly reveal if someone is a gay male (who else rents “Boys Briefs 3″?). If that person is a service member, or lives in a state with out anti-discrimination laws, that could get him harassed or fired.

    I just fictionalized my gender / birth year information on netflix.

  15. wgmleslie says:

    Princeton’s Paul Ohm is always resisting this sort of thing.

  16. Jack Daniel says:

    For what it’s worth, I’m now ageless to Netflix. Only took a minute.

  17. Anonymous says:

    You can email netflix about their privacy policy:
    privacy@netflix.com

    Of course, the email I got back says they “periodically” review emails to that account, so I’m not sure how helpful that is.

    As other comments have mentioned, it is easy to remove your birth year from your account data. Gender information can be changed to/from M/F but not changed to blank.

  18. dr_awkward says:

    Having worked on the Netflix Prize v 1.0 for a data mining class, I can personally say that we should all be very worried. Some machine learning expert… might… be able… to figure out what movies you rented!

    Big frappin’ deal.

    The bigger worry is the notion that what is learned by the expert may then be put to much more insidious uses by the usual rogues’ gallery of privacy infringers.

    But Netflix isn’t really one of them. Really? Do we need to get this worked up about movie preference data? Or is it a slow day for the ACLU?

  19. AirPillo says:

    Here is the response I got from Netflix when I queried them about it:
    “Thank you for your concern but rest assured, Netflix zealously guards your privacy. All the information we’re giving in The Netflix Prize 2 dataset is completely anonymous. It contains no personally identifiable information. It does not contain anyone’s name, address, or any means to connect a particular record with a specific Netflix member. As in Netflix Prize 1, the dataset contains some movie ratings from select anonymous members. It also includes some Queue adds and taste preferences, broad age ranges, gender and zip codes but, again, completely anonymous. But all that data is modified – our scientists call it perturbed – to make it anonymous. No one, no matter how sophisticated an engineer or analyst, will be able to link your name or any other Netflix member’s name to the data.”
    This satisfies me and I also think it’s a good idea

    Judging by that response, it looks as though they’re not releasing this as raw data. Instead it appears they may be doing the rather common statistical practice of organizing it and presenting it as grouped classes of data.

    For example, rather than a list of every person’s age, zip code, and info… it might be tables or other forms listing the information associated with ages 17-20 within each zip code, basically presenting it as a set of abstracted groups and their associated statistics. This completely anonymizes your own individual data point within the bulk because your contribution to that data is not communicated due to the abstraction.

    While I don’t trust a support employee’s ability to correctly answer a question confirming that, if that is the case then this whole hullabaloo is a completely unnecessary reaction.

    This is, of course, all based on my remedial college education in elementary statistics, so I’m open to being told I’m wrong, and still supportive of anyone who believes people should be allowed to opt-out or opt-in anyway.

    I don’t disagree with the urge to find out about this and protect privacy by the way. I just want to present a rational suggestion.

  20. Anonymous says:

    Here is the response I got from Netflix when I queried them about it:
    “Thank you for your concern but rest assured, Netflix zealously guards your privacy. All the information we’re giving in The Netflix Prize 2 dataset is completely anonymous. It contains no personally identifiable information. It does not contain anyone’s name, address, or any means to connect a particular record with a specific Netflix member. As in Netflix Prize 1, the dataset contains some movie ratings from select anonymous members. It also includes some Queue adds and taste preferences, broad age ranges, gender and zip codes but, again, completely anonymous. But all that data is modified – our scientists call it perturbed – to make it anonymous. No one, no matter how sophisticated an engineer or analyst, will be able to link your name or any other Netflix member’s name to the data.”
    This satisfies me and I also think it’s a good idea

  21. dculberson says:

    They would prevent the outrage if it was opt-in only.

  22. SednaBoo says:

    Ok, two points:

    1) You don’t have to put in your gender at NetFlix. I just checked my account, and I never selected either radio button. Unfortunately because it is a set of radio buttons I don’t think you can unselect once you have selected in the past.

    2) Rindan@1 alludes that we have little to fear because Netflix has no porn section. I will differ on that, because there are a lot of political films available there. You can easily tell someone’s political leanings by seeing if they’ve watched Outfoxed or Hillary: The Movie. While this shouldn’t be a cause for alarm, it could result in someone getting fired (remember the Justice Department?). No one should have to be compelled to self-censor private viewing activity because of who may be watching.

  23. Anonymous says:

    Frankly, for marketing purposes your name is irrelevant. Your ZIP/postal code will do, but ascertaining your address from at least some of your public records is relatively trivial, and your address is a sufficiently unique identifier when occupancy rates are less than 3.

    This is to find buildings or blocks with statistically significant movie-rental rates of outlying films, for more effective billboard placement, by a dying industry that thinks preaching to the converted will increase market share.

    Full of FAIL, but unless the CIA has a plan to wiretap everyone who watched FAHRENHEIT 9/11…

    oh crap…

  24. chungdoh says:

    Cory seems to be taking some flak for this post. I’m inclined to agree that some bb posts lean toward the hyperbolic side of things, but note this, from the comments section after the article:

    “According to Sweeney, 53% of people in America are uniquely identified by {city, birth date, sex} and 18% by {county, birthdate, sex}.”

    What we’re talking about is ZIP, birth *year*, and sex. ZIP can be even more specific than city, and a lot more specific than county, so we could reasonably expect that a rather large number of people could be identified from this data set. It’s hard to know what the exact percentage of the population could be identified, but some could–no doubt.

    True, there is no porn on Netflix. But there are other semi-embarrassing movies that people might rent. It’s personal information that should be kept private.

  25. Rindan says:

    The article states pretty clearly why this is absurd to fear. They are not releasing birthdays. They are releasing age. With my gender, zip code and age, you would narrow it down to, oh, a few thousand. I mean sure, the one guy who is 98 in a town of 500 might be identified as the guy who likes a good slasher, but there has to be a limit to the paranoia. I am a pretty big privacy buff, but I also recognize the need to examine and use data. This is how companies get better at serving.

    Finally, despite what the author says, these really are just movie preferences on a service that doesn’t even have a porn section. I don’t think many people need their movie preferences hidden behind 4096 bit encryption. Gender, age, and zip code is good enough encryption for such trivial data. If your paranoia level is so high that that minimal information being released attached to frigging movie preferences really causes you terror, do yourself a favor and don’t sign up for, um, anything.

    Save the privacy fears for something worthwhile, not the extremely slim chance that a really bored academic can find out that you or one of a hundred other people in your zip code with the same age and gender rated Boondocks Saints two stars.

  26. mdh says:

    If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.

    The last time this worked out to me getting a free upgrade to four movies at a time for a few months – not sure what the lawyers got.

  27. Anonymous says:

    I spent a good 5 minutes figuring out what does colombian coffee have to do with it.
    Exxon Valdez, not Juan Valdez.

  28. Quadro says:

    Netflix did not take my date of birth. Heck, it didn’t even require my year of birth, but there’s an optional place to enter that in. It’s also really easy to just leave it blank.

  29. newman100 says:

    Do I still have to be afraid if I don’t live in the ZIP code I was born in?

  30. dfrankow says:

    “I wonder what the effect on data miners would be if everyone creatively altered the information provided.”

    I studied that.

    - PDF: http://www-users.cs.umn.edu/~dfrankow/files/privacy-sigir2006.pdf
    - Video: http://video.google.com/videoplay?docid=6474169875352273382

    It turns out it is especially hard to disguise your relations in a sparse relation space. That is, in many cases each person has a few rare things they are related to (e.g., you like or hate some little-known movies). You have to obscure a lot of that before you are properly anonymized.

    Dan

  31. paulohm says:

    Thanks for the post, Cory. I wanted to make one clarification: I teach at the University of Colorado Law School, not Princeton. I blog with Princetonians, but that’s my only connection to the school.

    And to all the commenters, thanks for your thoughts. To those who disagree with my take, if you have the time please read my article, because I address or rebut many of the points that have been made. (But I admit that some of the disagreements are simply differences or opinion.)

  32. matt4077 says:

    We’re not doing the privacy cause a favor by resorting to hyperbole when promoting it.

  33. nox says:

    Cory,

    I respect your stance on privacy. Modern technology has completely outpaced existing legislation. With the very serious threats posed to privacy in this modern age, we must pay close attention to sticking close to the facts or risk alienating the public – as the comments in this thread on this relatively enlightened blog show.

    Paul is right to draw attention to how companies share use their information; Netflix should be careful about this. However, I find that his article is dangerously alarmist, selective in its reporting of facts, and simply unsustainable.

    A few minutes of googling got me these comments, from Latanya herself, that Paul has not mentioned in his article or comments:

    And, {year of birth, gender, 5-digit ZIP} are likely to uniquely identify 0.04% (or about 105,016 people) of the U.S. population. In terms of more sensitive states, 0.89% (or 5703 people) of the population of Iowa is likely to be uniquely identified by {year of birth, gender, 5-digit ZIP}.

    Arvind Narayanan and Vitaly Shmatikov’s paper is cited, but their findings require very private or obscure information to identify someone with any degree of reliability.

    There is a risk to privacy depending on how Netflix prepares their dataset, but Paul has chosen his facts and figures to dramatize it. Netflix should consider obfuscation, opt-ins, and caution when it comes to sensitive states like Iowa, as well as partnerships with academics (and cape-donning privacy advocates) to identify privacy risks before data sets are released to the public.

    We surrender privacy to live in society. In the small town I grew up in, the video store clerk and other customers knew exactly what type of movies I watched and who I was.

    Paul is advocating that online interactions should be absolutely private – a level of privacy that did not exist and isn’t protected in the old world. I don’t find this to be reasonable, feasible, or sustainable. It risks desensitizing the public to the serious privacy issues that already exist.

    Participating in an online society will impact privacy. Just as in Real Life, we must find the optimal compromise of privacy, personal freedom, and societal function.

  34. Anonymous says:

    In small towns, this is absolutely enough to personally identify someone.

  35. Cory Doctorow says:

    RTFA. Year of birth isn’t as revealing as date, but still has statistical likelihood of disclosure in many cases.

    You do privacy no favors when you dismiss disclosure out of hand without thought.

  36. anansi133 says:

    It seems to me that a netflix user with very specific arthouse tastes is going to be much easier to identify than the more typical user who only wants to watch the latest blockbuster.

    For that matter, it’s those kind of people that corporate america would be most interested in pinpointing. The hoi polloi aren’t likely to cause trouble, they’ve been playing the same role all their lives.

  37. geddygibson says:

    There may not be a pron SECTION in Netflix, but there are plenty of foreign (e.g. Catherine Breillat’s work) and independent (e.g. “Brown Bunny”) films that have graphic depictions of the same sorts of a activities one sees in pron.

    Not that I would know first hand, of course (so to speak).

  38. aldasin says:

    When forced to enter info like gender or age, I just put the wrong info in. Easy.
    Everyone should be data poisoning.
    Don’t withhold it, overwhelm the borg with bad data.

  39. Wardish says:

    While they may be able to say they didn’t know, suprising how well that works for corp’s but not citizens, but if they are informed in a verifiable fashion…

    Registered return receipt required letter to the president and cc’d, again return receipt required , to all board members and the legal department detailing the facts including references to the pertinacity information.

    Business style, and properly signed.

    Ward

  40. danlalan says:

    The hoi polloi aren’t likely to cause trouble, they’ve been playing the same role all their lives.

    1) Arrogant much?

    2) French Revolution, intifada, Rwanda

  41. nox says:

    Oh, look, it took me half an hour to get sent something far more disturbing.

    Your sexual orientation can be discovered by examining your facebook social network.

  42. Wardish says:

    hehehe,

    And don’t forget to properly spell check…

    Ward

  43. M says:

    You don’t need to worry until the day your neighbor’s fondness for watching “Reefer Madness” results in a SWAT team breaking your door down in the middle of the night, terrorizing your wife and kids and killing the dog because they got the address wrong which, of course, NEVER happens in the good old U S of A. Never.

Leave a Reply