The Electronic Frontier Foundation's Kevin Bankston discusses the news that Yahoo! will radically reduce the retention period for its logs, anonymizing them after just 90 days (compared with Google's 9 months). It's a pretty radical development: for years, I've been skeptical of claims that tech companies would compete on privacy, issuing press releases that said, in effect, "Use us, we're less snoopy and creepy than those guys!" But here we are — the company whose data-retention and palsy relationship with the Chinese Politburo put a campaigning journalist in jail is now saying that it's going to sanitize its logs on a quarterly basis. Kevin's got a reality check:
Unfortunately, it's hard to gauge the true privacy impact of this policy change until we know exactly what steps Yahoo will be taking to anonymize the data. The devil's in the details, and if Yahoo's anonymization process isn't robust enough, this new logging policy may end up being more privacy PR than privacy protection. Fully anonymizing IP addresses and cookie data can be tricky, and even if that data is thrown away completely, there's still the possibility of individuals being identified based on the content of their search queries, as AOL's search data spill demonstrated.
So, as Yahoo finalizes its policy plans, it should take a look at EFF's newly-revised Best Practices for Online Service Providers, which recommends a range of techniques to strongly anonymize online user data. Hopefully, we'll see the details of Yahoo's plan soon, as well as new announcements from other search engines trying to keep up in this accelerating privacy competition. Internet users have long trusted search engines and internet portals like Yahoo and Google with the privacy of their most intimate and sensitive data, and we're glad to see those companies finally vying to earn that trust.
Update: Christopher sez, "You note that Google currently 'anonymizes" logs after 9 months. That
is not true, due to the fact that they do not attempt to mask cookies
until the 18 month mark. Removing some tiny portion of an IP address
from the logs is worthless, if cookies can be used to match up new log
entries and older log entries."