Our public health data is being ingested into Silicon Valley's gaping, proprietary maw

In a lead editorial in the current Nature, John Wilbanks (formerly head of Science Commons, now "Chief Commons Officer" for Sage Bionetworks) and Eric Topol (professor of genomics at the Scripps Institute) decry the mass privatization of health data by tech startups, who're using a combination of side-deals with health authorities/insurers and technological lockups to amass huge databases of vital health information that is not copyrighted or copyrightable, but is nevertheless walled off from open research, investigation and replication.

The key to their critique isn't just this enclosure of something that rightfully belongs to all of us: it's that this data and its analysis will be used to make decisions that profoundly affect the lives of billions of people; without public access to this, it could be used to magnify existing inequities and injustice (see also Weapons of Math Destruction).


Even when corporations do give customers access to their own aggregate data, built-in blocks on sharing make it hard for users to donate them to science. 23andMe, holder of the largest repository of human genomic data in the world, allows users to view and download their own single-letter DNA variants and share their data with certain listed institutions. But for such data to truly empower patients, customers must be able to easily send the information to their health provider, genetic counsellor or any analyst they want.

Pharmaceutical firms have long sequestered limited types of hard-to-obtain data, for instance on how specific chemicals affect certain blood measurements in clinical trials. But they generally lack longitudinal health data about individuals outside the studies that they run, and often cannot connect a participant in one trial to the same participant in another. Many of the new entrants to health, unbound by fragmented electronic health-record platforms, are poised to amass war chests of data and enter them into systems that are already optimized (primarily for advertising) to make predictions about individuals.

The companies jostling to get into health face some major obstacles, not least the difficulties of gaining regulatory approval for returning actionable information to patients. Yet the market value of Internet-enabled devices that collect and analyse health and fitness data, connect medical devices and streamline patient care and medical research is estimated to exceed US$163 billion by 2020, as a January report from eMarketer notes (see 'The digital health rush' and go.nature.com/29fbvch). Such a tsunami of growth does not lend itself to ethically minded decision-making focused on maximizing the long-term benefits to citizens.

It is already clear that proprietary algorithms can replicate and exacerbate societal biases and structural problems. Despite the best efforts of Google's coders, the job postings that its advertising algorithm serves to female users are less well-paying than are those displayed to male users2. A ProPublica investigation in May demonstrated that algorithms being used by US law-enforcement agencies are likely to wrongly predict that black defendants will commit a crime (see go.nature.com/29aznyw). And thanks to 'demographically blind' algorithms, in several US cities, black people are about half as likely as white people to live in neighbourhoods that have access to Amazon's one-day delivery service (see go.nature.com/29kskg3).


Stop the privatization of health data
[John T. Wilbanks and Eric J. Topol/Nature]