Open-source human genomes

Yesterday, during a World Science Festival panel on human origins and why our species outlasted other species of Homo, geneticist Ed Green mentioned that there were thousands of sequenced human genomes, from all over the world, that had been made publicly available. Our code is open source.

But where do you go to find it? Several folks on Twitter had great suggestions and I wanted to share them here.

The 1000 Genomes Project—organized by researchers at the Wellcome Trust, the National Institutes of Health, and Harvard—is working on sequencing the genomes of 2500 individuals. The data they've already collected is available online. Read a Nature article about The 1000 Genomes Project: Data management and community access.

The Personal Genome Project is interactive. Created by a researcher at Harvard Medical School, the program is aimed at enrolling 100,000 well-informed volunteers who will have their genomes sequenced and linked to anonymized medical data. Everything that's collected will be Creative Commons licensed for public use.

The University of California Santa Cruz Genome Browser is a great place to find publicly available genomes and sequences.

Thanks to Eva Rose, Aatish Bhatia, and Edward Banatt.



  1. Another openSNP-guy here. 

    By now you can also upload your FamilyTreeDNA-genotyping data as well as your Exome if you’re one of 23andMe-early-exome-adopters.

    And can provide phenotypic information (hair/eye colour, genetic diseases you might have et al.) about yourself as well. The data gets also published under CC Zero :-)

  2. Hi — Great to see the Personal Genome Project mentioned, thank you! 

    A key distinction about PGP data is that we don’t consider genomes to be truly “anonymizable”, especially not when combined with other data like health information. (This is broadly held to be true in research; for this reason, most genetic data from studies that gets collected these days is only shared in a controlled-access manner, e.g. through “dbGaP”.)

    So, I hesitate to use the term “anonymized” to refer to anything in our project — especially when many participants choose to be explicitly public about which account belongs to them (although this is not required). Among other things, our “open consent” process involves understanding the risk of re-identification. It’s not for everyone, but through this process researchers can take advantage of people willing to donate their data — and privacy, potentially — to science. You can read more about it on our study guide:
    Currently, if you’re looking for the data you can genome and genotyping files linked on personal genome reports here: and linked to participant profiles here: (The first link is limited to genome & exome data.)With respect to things like one key element for use of data by researchers is that the data is collected under an IRB-approved process. People who join the PGP can also upload and donate their 23andme & deCODEme data & etc., with the benefit of it being very “kosher” for researchers to use — if anyone has uploaded to other sites, I suggest they also consider enrolling in the PGP.  :-)

    1. Why are you using a different name? Did you lose access to your BB account during the switch?

  3. Yeah, Dan Vorhaus is adamantly against us risking dealing with the legal landscape of other countries (e.g. ). Hard enough to struggle with what happens in the US (I wonder if California’s Genetic Information Privacy Act will be a problem for us).

