You can unscramble the hashes of humanity's 5 billion email addresses in ten milliseconds for $0.0069

Marketing companies frequently "anonymize" their dossiers on internet users using hashes of their email addresses — rather than the email addresses themselves — as identifiers in databases that are stored indefinitely, traded, sold, and leaked.


Cryptographically secure hashing converts any file to a short string (its "hash"), and it's theoretically impossible to turn the hash back into the source-file. For example, hashing user@example.com yields "b58996c504c5638798eb6b511e6f49af." Every time you compute the hash of an email address, you'll get the same string, making it easy to merge new pieces of personal information with existing dossiers, even after the email address associated with that dossier has been deleted and replaced with its hash.


But you don't have to use math to to turn a hash back into its source file. Instead, you can compute a "rainbow table" in which you compute the hash of every possible input (say, every possible 8-character password, and produce a kind of reverse directory of their hashes.

There are about 5 billion email addresses in use today. Computing the MD5 hash of all of those addresses using Amazon's cloud would cost $0.0069 and take ten milliseconds.


In other words, hashing is not an effective means of de-identifying data.

The thing is, companies believe that it is. In particular, companies that are seeking to comply with the impending EU General Data Protection Regulation claim that email hashing satisfies the GDPR's criteria for de-identifying, and since de-identified data can be freely shared, these companies are effectively claiming that once they take this ineffective step, they get to pretend that there is no risk to releasing your personal information.


Hashed email addresses can be easily reversed and linked to an individual, therefore they do not provide any significant protection for the data subjects. The existence of companies that reverse email hashes shows that calling hashed email addresses "anonymous", "private", "irreversible" or "de-identified" is misleading and promotes a false sense of privacy. If reversing email hashes were really impossible as claimed, it would cost more than 4 cents.

Even if hashed email addresses were not reversible, they could still be used to match, buy and sell your data between different parties, platforms or devices. As privacy scholars have already argued, when your online profile can be used to target, affect and manipulate you, keeping your true name or email address private may not bear so much significance [7].

Four cents to deanonymize: Companies reverse hashed email addresses [Gunes Acar, Steve Englehardt, and Arvind Narayanan/Freedom to Tinker]