Pete Warden writes on O'Reilly Radar about the problems of anonymizing datasets. AOL, Netflix and others have been burned by releasing datasets that they thought had been stripped of identifiable elements, only to discover that de-anonymizing some or all of the data was easier than they thought. He cites research by Arvind Naryanan, and then makes some practical recommendations for handling user data (including the most important principle: minimize what data you collect in the first place):
Precisely because there are now so many different public datasets to cross-reference, any set of records with a non-trivial amount of information on someone's actions has a good chance of matching identifiable public records. Arvind first demonstrated this when he and his fellow researcher took the "anonymous" dataset released as part of the first Netflix prize, and demonstrated how he could correlate the movie rentals listed with public IMDB reviews. That let them identify some named individuals, and then gave access to their complete rental histories. More recently, he and his collaborators used the same approach to win a Kaggle contest by matching the topography of the anonymized and a publicly crawled version of the social connections on Flickr. They were able to take two partial social graphs, and like piecing together a jigsaw puzzle, figure out fragments that matched and represented the same users in both.
Why you can't really anonymize your data
Hackers tried to break into the World Health Organization earlier in March, as the COVID-19 pandemic spread, Reuters reports. Security experts blame an advanced cyber-espionage hacker group known as DarkHotel. A senior agency official says the WHO has been facing a more than two-fold increase in cyberattacks since the coronavirus pandemic began.
Additional $15M will go to third parties and nonprofits
The death toll in Italy’s coronavirus outbreak today passed 1,000. Schools throughout Italy are completely shut down, which is reportedly driving a surge in internet traffic as bored kids forced to stay indoors turn to online games.
Can’t sit still during the pandemic? You’re not alone. Many folks are using their social distancing time to decompress and zone out on Tiger King, some even pushing back against the idea of being productive. But plenty of others find themselves bored, restless, and in need of projects and goals, somewhere to direct their energy. […]
Even if you don’t miss much else about the office right now, there’s a good chance your home laptop is making you nostalgic for the added efficiency of that pair of monitors on your desk at work to spread out your workflow. There’s no telling how long the new normal may continue to be the […]
If you’re looking to become a software engineer or it’s an idea you’ve tossed around half-seriously, there may be no better time than now to take the leap. It’s one of the fastest-growing, most in-demand roles already. And in the midst of the pandemic, between the extra hours you likely have in your day, and […]