What cancer statistics actually mean

Genius science writer Ed Yong used to work for a cancer charity, so he's seen how the cancer research sausages get made. In a new post at Not Exactly Rocket Science, Ed takes you on a brief tour of the factory, explaining why even good data doesn't necessarily mean what you think it means.

The post is based around a new study that says 16.1% of all cancers worldwide are caused by infections. This statistic is talking about stuff like HPV—viruses and other infections that can prompt mutations in the cells they infect. Sometimes, those mutations propagate and become a tumor.

That statistic tells us that infections play a role in more cancers than most laypeople probably think, Ed says. It gives us an idea of the scale of the problem. But you have to be careful not to read too much into that 16.1%.

The latest paper tells us that 16.1% of cancers are attributable to infections. In 2006, a similar analysis concluded that 17.8% of cancers are attributable to infections. And in 1997, yet another study put the figure at 15.6%. If you didn’t know how the numbers were derived, you might think: Aha! A trend! The number of infection-related cancers was on the rise but then it went down again.

That’s wrong. All these studies relied on slightly different methods and different sets of data. The fact that the numbers vary tells us nothing about whether the problem of infection-related cancers has got ‘better’ or ‘worse’. (In this case, the estimates are actually pretty close, which is reassuring. I have seen ones that vary more wildly. Try looking for the number of cancers caused by alcohol or poor diets, if you want some examples).

And that's only one of the complications involved in understanding cancer statistics. You really should read Ed's entire post. After you do, a lot of apparent inconsistencies in cancer data will make a lot more sense to you. For instance: What about the cancers caused by radiation exposure?

I ran into some of these problems while researching Before the Lights Go Out, my book about electricity and the future of energy. The topic meant I had to spend some time dealing with the risks posed by nuclear energy. Specifically, people want to know what happens to the local population when a nuclear power plant melts down. How many people die? The problem: There's more than one legitimate answer to that question.

Take Chernobyl. There is not one, definitive number I can give you for how many people died because of the accident at the Chernobyl nuclear power plant. There've been, if I'm counting correctly, six different papers estimating how many people the radiation released during the accident will eventually kill. The various estimations range from 4000 to almost a million. But beyond checking each other's methodology—and there are some serious problems with the methodology used by the paper that estimated the highest death toll—it's really hard to say who is right and who is wrong.

If you read Ed's post, you'll get two good clues as to why that is:
First, statistics that show you how many cancer deaths were caused by x factor aren't produced by counting the numbers of dead cancer patients. Those statistics are based on data, assumptions, and computer models. Use different data sets, different models, or different assumptions and you will get different numbers.
Second, cancer isn't like a collapsing roof. If a beam falls on someone's head you can look at the autopsy and say, "This death was caused by this piece of wood." You don't have to take into account the hundreds of times that person might have bonked their head on a doorway or cabinet over the course of their life. It was clearly the beam that did them in. But there's usually more than one reason people get cancer. In fact, a certain percentage of the population will get cancer simply as a side effect of being alive. Add into that all the other risk factors that most of us are exposed to over the course of our lives and it becomes extremely difficult to tease apart a real, honest-to-god answer to the question, "What caused this specific person's specific cancer?"

Read Ed Yong's full post at Not Exactly Rocket Science

Image: Cancer?, a Creative Commons Attribution Share-Alike (2.0) image from runran's photostream


  1. The HERP cancer study ( http://potency.berkeley.edu/pdfs/herp.pdf ) is a fascinating attempt at quantifying risk.  It attempts to rank the likely carcinogenic effects of a number of substances *when lifetime exposure is taken into account*.   So it’s amusing to see polychlorinated biphenyls (PCBs) way down the list, between hamburgers and toast.  Alcoholic beverages, coffee and lettuce are orders of magnitude higher…

    However, this is all based on rodent studies, so YMMV.

  2. Totally understand your point of view, but the problem of misunderstandig arise when politician or clinician not familiar with statistic presents data. Epidemiologists or clinical statisticians usually are much more carefull in presenting problems related to cancer statistics and know very well if a confidence interval or a meta-analysis should follow presented data. The point is : at which  level comunication between these figures and people fails?

  3. Summary: If you want to understand Statistics, you should learn about Statistics.  Read up on Cause VS Causality as well. Otherwise reading too much into statistical analysis may be misleading.

Comments are closed.