Google releases a free/open differential privacy library

"Differential privacy" (previously) is a promising, complicated statistical method for analyzing data while preventing reidentification attacks that de-anonymize people in aggregated data-sets.

If differential privacy were to be perfected, it would represent an amazing have-your-cake-and-eat-it proposition for Big Tech, which could continue to mine "behavioral data" for insights without having to worry about privacy scandals, regulation or liability (it would also have giant benefits for other kinds of research, such as medical studies).

So far, though, differential privacy has been all promise, no pants, as implementations have been shown to be weak or flawed or both (and once these implementation defects have been published, all the data that was released prior to the disclosure is in danger of re-identification attacks).

Now, Google has published a set of differential privacy libraries under the very permissive Apache License, which anyone can use, study, modify or distribute for free. Google is hoping that flaws and weaknesses in its library will be discovered through widespread scrutiny of its code, and that uptake of the differential privacy tools will reduce the pressure to end commercial data collection and retention. Google has also published a suite of testing and auditing tools to spot problems with implementations before they are made public.

Developers could use Google's tools to protect all sorts of database queries. For example, with differential privacy in place, employees at a scooter share company could analyze drop-offs and pickups at different times without also specifically knowing who rode which scooter where. And differential privacy also has protections to keep aggregate data from revealing too much. Take average scooter ride length: even if one user's data is added or removed, it won't change the average ride number enough to blow that user's mathematical cover. And differential privacy builds in many such protections to preserve larger conclusions about trends no matter how granular someone makes their database queries.

Part of the reason it's so difficult to roll your own differential privacy is that these tools, like encryption schemes, need to be vetted by as many people as possible to catch all the flaws and conceptual issues that could otherwise go unnoticed. Google's Gipson says this is why it was such a priority to make the tool open source; he hopes that academic and technical communities around the world will offer feedback and suggestions about improving Google's offering.

google/differential-privacy [Google/Github] [Lily Hay Newman/Wired]