ENCODE, the media, and what we really know about the human genome

If you've read anything in the past week about ENCODE—a group of laboratories that recently published their latest work on the human genome—then you need to read John Timmer's excellent piece over at Ars Technica.

What ENCODE has actually done, and why it matters, has been widely misrepresented in the mainstream press—largely because of misleading press releases put out by ENCODE, itself. Timmer sets the record straight. It's a long read, but a fascinating one. Highly recommended.

This week, the ENCODE project released the results of its latest attempt to catalog all the activities associated with the human genome. Although we've had the sequence of bases that comprise the genome for over a decade, there were still many questions about what a lot of those bases do when inside a cell. ENCODE is a large consortium of labs dedicated to helping sort that out by identifying everything they can about the genome: what proteins stick to it and where, which pieces interact, what bases pick up chemical modifications, and so on. What the studies can't generally do, however, is figure out the biological consequences of these activities, which will require additional work.

Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome." Unfortunately, the significance of that statement hinged on a much less widely reported item: the definition of "biochemical function" used by the authors.

This was more than a matter of semantics. Many press reports that resulted painted an entirely fictitious history of biology's past, along with a misleading picture of its present. As a result, the public that relied on those press reports now has a completely mistaken view of our current state of knowledge (this happens to be the exact opposite of what journalism is intended to accomplish). But you can't entirely blame the press in this case. They were egged on by the journals and university press offices that promoted the work—and, in some cases, the scientists themselves.

Read the rest of John Timmer's story at Ars Technica

Image: Micah's DNA, a Creative Commons Attribution Share-Alike (2.0) image from micahb37's photostream



  1. Reminds me of when they first started this stuff, reported they`d worked out about 1%, and somehow that turned into `We know what every bit of our genes does`

  2. This is a particularly annoying case because as the article says, it isn’t just the typical “journalists don’t get science and write silly things about it” — the whole “80% functional” number is actually what the scientists in the ENCODE project are pushing. Which annoys the rest of us in genomics because this is only true if you use their definition of “functional”, which basically boils down to being transcribed even if the cell does nothing with it. According to this logic, radio static is functional because radios receive it and make it into sound.

  3. For a response to this, and more information about ENCODE in general, read Ewan Birney’s blog: http://genomeinformatician.blogspot.co.uk/

  4. 80%? Well there has always been the camp that believes that histones, other DNA binding proteins, and DNA modification controls the world, and that’s been around for 20 years or more.  You can probably measure that in bulk, probably long before large scale DNA sequencing became possible.  I think people started advancing this theory many years before there was any empirical evidence, and it had sort of a crackpot feel about it because other stuff moved ahead so rapidly.

    But when there is some important bit of gene regulation, a close look at histones and DNA modification usually reveals something interesting about DNA or DNA binding protein modification, and now it’s recognized as being important and an under-developed field   I’m not aware that anybody found something interesting by starting with a search for DNA binding sites, so as far as I know it never became an engine driving a host of discoveries.  

    Of course, maybe this represents a baseline data set that will allow genome wide association studies of methylation or DNA binding with specific diseases. In fact, many common but mysterious diseases are likely to turn out to be the result of broad changes in transcriptional regulation in the genes commonly associated with the disease, but which lack mutations.

Comments are closed.