DNA for data storage

Researchers have successfully stored information in synthetic DNA and then sequenced the DNA to read the data. Nick Goldman and his colleagues from the European Bioinformatics Institute (EBI) encoded all of Shakespeare's sonnets, an audio clip of Martin Luther King's "I have a dream" speech, Watson and Crick's paper on DNA's structure, a photo of the EBI, and an explanation of their data conversion technique. Last year, Harvard molecular geneticist George Church encoded a book he had written in DNA, but EBI's breakthroughs are in the way the data is encoded and its error-correction. From the abstract of their scientific paper published at Nature:
We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information10 of 5.2 × 106 bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.
"Synthetic double-helix faithfully stores Shakespeare's sonnets" (Thanks, Mike Pescovitz!)


  1. So when this technology inevitably makes it into living cells, and mutations occur, will the new versions be considered derivative works?

    1. This technology is already present in living cells, in fact the DNA inside a single cell can store the genome of an entire cell.

      1. …I thought it was pretty clear “this technology” referred to computer files encoded in synthetic DNA, but I’ll still laugh a little.

  2. I still recall a conjecture by Isaac Asimov that if some alien visitor to earth  a million years ago wanted to send a message to the future earth, the best way would be to encode it in DNA of some organism.   Certainly hard to think of anything else likely to last as well, though one might think mutations could spoil the message over time.

  3. I wonder if cosmic rays could have ill effects? 

    “Okay! Let’s look at grandma’s wedding pictOHMYGOD THE KRAKEN!”

    1.  Ill effects? Yes… but that’s why error-correcting codes are used. (Just as they are for other forms of data storage which are either at high risk or carry critical information.

      (Back in the day, some mainframes used to spend up to a third of their hardware continuously checking the other two-thirds, correcting errors where possible, and ensuring that they were reported in a way that let them be easily corrected even when the error was intermittent. These days, you folks won’t even deign to pay for parity bits … yes, memory is more reliable than it was, but a lot of this is that MS has convinced you that machines are supposed to malfunction.)

  4. Watson and Crick’s paper on DNA’s structure, a photo of the EBI, and an explanation of their data conversion technique

    Yo dawg, I heard you like DNA…

    1. Unpacking the genetic code of monkeys yields an infinite number of monkeys at much lower cost…

  5. I’m chuckling at the random pile of data used in the experiment. “Let’s see, what do we have on hand that’s amusing and adds up to enough bits to be interesting?”

  6. Now that we know this CAN be done, maybe it’s worth taking a step back and ‘reading’ our OWN dna as non-biological ‘text,’ looking for indicia of language or symbology.

    Not looking for ‘God,’ necessarily. Instead, imagine we’re the product of some kind of deliberate panspermia, with Arecibo-style blueprints for something like faster-than-light travel or a Dyson sphere CONTAINED in us.   

    If the Earth was going to end, and humans wanted to let a new world have a crack at it, but hadn’t yet mastered space travel, we could use life itself as a time capsule containing all human knowledge. Encode a bunch of tardigrades with everything from Shakespeare to the Manhattan project, then ‘hitchhike’ a few thousand on the backs of every near-passing comet.

    Or let’s say you’re a Type II or III Kardashev-scale civilization from Alpha Centauri, and you discover, conclusively, that FTL is impossible. The most efficient way to ‘colonize’ a galaxy, then, might be to send self-unpacking life spinning off on comets/meteoresquerie to all the closest goldilocks zones.

    The ‘data’ encoded in the life could be as simple as a star coordinate, so us ‘colonists’ would know where to phone home, or where to look for instructions. This would have the additional benefit of halving communication delays (the same reason robotic probes are advanced as a possibility for interstellar comm).

    It’s at least arguable that the self-preserving nature of dna leads INEXORABLY to natural selection, and so, to sentience. But the raison d’etre of dna itself is opaque. Just like a virus, dna has no natural REASON to exist other than its own self-replication, but that still begs the question. Maybe the seeming arbitrariness and unlikelihood of a rock pocked with volcanoes birthing humans is MEANT to attract our attention. Not as proof of God’s love, but as the volcanic glass that “has no earthly business in a Maine hayfield.” Maybe it’s our Macguffin.

    Now, the DNA ‘data’ would have to stick with the organisms through thick and thin, all the way from tardigrades up to homo sapiens. But wouldn’t you know it – we have that!!!!!!!!

    MITOCHONDRIAL DNA – still kicking around in humans and eukaryotes alike – is as old as life on Earth.

    I’ll give you a moment to catch your breath…..it gets even better….

    It has been suggested that Mitochondrial DNA is the cause of (and so, related to the cure for) AGING.


    I expect a first draft on my desk by Friday.

  7. And you can bet somebody at the MPAA is drafting a bill against DNA copyright information, even as we speak. 

      1. So it more of an example to the general public of the scale and capacity of DNA than something that can be actually applied? Because if the DNA if full of unrelevant to living information….

  8. So if we take the encoded speech and turn it into a living being, do we get Natasha Henstridge?

  9. The remakes of Cutthroat Island, Waterworld, and 25% of all pirate movies will be much improved by having the treasure map encoded in DNA, instead of [SPOILERS] tattooed on somebody’s back.

Comments are closed.