Widespread statistical error discovered in peer-reviewed neuroscience papers

"Erroneous analyses of interactions in neuroscience: a problem of significance," a paper in Nature Neuroscience by Sander Nieuwenhuis and co, points out an important and fatal statistical error common to many peer-reviewed neurology papers (as well as papers in related disciplines). Of the papers surveyed, the error occurred in more than half the papers where it could occur. Ben Goldacre explains the error:

Let’s say you’re working on some nerve cells, measuring the frequency with which they fire. When you drop a chemical on them, they seem to fire more slowly. You’ve got some normal mice, and some mutant mice. You want to see if their cells are differently affected by the chemical. So you measure the firing rate before and after applying the chemical, first in the mutant mice, then in the normal mice.

When you drop the chemical on the mutant mice nerve cells, their firing rate drops, by 30%, say. With the number of mice you have (in your imaginary experiment) this difference is statistically significant, which means it is unlikely to be due to chance. That’s a useful finding which you can maybe publish. When you drop the chemical on the normal mice nerve cells, there is a bit of a drop in firing rate, but not as much – let’s say the drop is 15% – and this smaller drop doesn’t reach statistical significance.

But here is the catch. You can say that there is a statistically significant effect for your chemical reducing the firing rate in the mutant cells. And you can say there is no such statistically significant effect in the normal cells. But you cannot say that mutant cells and mormal cells respond to the chemical differently. To say that, you would have to do a third statistical test, specifically comparing the “difference in differences”, the difference between the chemical-induced change in firing rate for the normal cells against the chemical-induced change in the mutant cells.


  1. People really do this and get papers published? Anyone who has passed even a basic experimental methods class should not make that error; moreover, the correct analysis is trivial – either a 2×2 ANOVA or a t-test on difference scores would allow one to infer a statistically reliable interaction.

  2. Hey, at least they are *trying* to do the stats.  Granted it’s as useful as wet toast, but compare it to homeopathy. 

    And shame on their PIs and biostats depts for letting this crap slip through.  It’s unbelievable.

  3. One of the best comments about math education that I’ve seen lately was about statistics: that it should be the pinnacle of high school math, not calculus. Statistics are actually applicable to most of our lives, where as calculus is not.

    It’s in a TED talk: http://www.ted.com/talks/arthur_benjamin_s_formula_for_changing_math_education.html

  4. As an undergrad assistant in an ecology lab, I was dismayed to see how poorly the grad students understood (or at least applied) basic statistics. And yet they published. I considered pursuing a career in statistics with the idea that I could be some sort of consultant to ecology researchers, but my path led elsewhere. But is that a job that one can have, or are the researchers themselves trusted to master a domain completely separate from their primary field?

  5. Although… in a different style of analysis, many meta-analyses are performed between papers reporting two different treatment/placebo pairs and an inference is drawn in the absence of a head-to-head comparison because those head-to-head randomized trials have not yet been performed, or are unethical to administer.

    To ignore the difference-of-differences on a single-run trial is a travesty.  But on a larger, epi scale, this sort of thing is standard-issue.  It all boils down to knowing WTF you are doing and not being a retard.

    1. have any stats on the former and the latter? because the post seems to say that your two golden rules are being broken, regularly. 

  6. I’ve been saying for years that teaching calculus and algebra (and conic sections) in high school is a bygone product of an era when we were raising kids to become space engineers; in the modern Internet age, they should be learning graph theory and Bayesian statistics. But that amount of attention to education is perhaps too difficult for the lumbering, rigid American political system.

    Anyway: I don’t think a t-test is really appropriate here, since it assumes a linear model. Fisher’s exact test is model-free and works great on small samples. Definitely my favorite test for measuring differences.

    1. Absolutely agreed. I think statistics should be taught before calculus.

      @KWillets:disqus : You can learn a huge amount of stats without calculus. Both the ideas behind it and the actual practice of it. Certainly more than enough for High School, and first semester stats courses in college.

    2. I’ve been saying for years that teaching calculus and algebra (and conic sections) in high school is a bygone product of an era when we were raising kids to become space engineers; in the modern Internet age, they should be learning graph theory and Bayesian statistics.

      You know, somebody, somewhere, has to design and build actual products in order to support this “Internet” that you speak of.

      1. Yes, that’s true; however, I wasn’t suggesting that people don’t learn calculus, especially if it’s useful to them. I learned statistics because it was useful to me in the course of my studies, even though it wasn’t part of my high school curriculum. I’m just suggesting that, in consideration of the way the mass of society is tuned, teaching everybody calculus doesn’t seem optimal. Statistics is not only more generally practical – it helps you to understand poll numbers, bogus psychology experiments, gambling odds, etc. – it is probably more appropriate to most fields of study these days. Every biologist needs to know statistics; almost none of them learn it, though most of them are taught calculus to little purpose or effect. Meanwhile, the electrical engineers have to study much more calculus anyway. Don’t you think the default curriculum should reflect the modal requirement?

  7. How would people do any more than basic stats without calculus?  I’m 95% confident they would need to integrate a Gaussian.

    1. As a computational biologist, what you seem to refer to as “basic statistics” is perhaps better described as “useful statistics”. While I have taken several statistics courses where calculus was used, most of what I learned there seemed to be only of interest to those fascinated by statistics for its own sake rather than as a practical tool to analyze data. Scientists just need to know how to design experiments so that appropriate tests can be performed (typically by a software package). Although it does seem from Goldacre’s article that many scientists don’t even have this grasp of statistics…

      1. That is because they move into computation, but by that logic you could say anything math based that graces a computer screen is linear algebra. Calculus is still very important in being able to understand what is happening instead of just reading charts or plugging numbers into a program/calculator.

        1. I don’t think you’re correct. How is calculus needed for understanding most stats? An example would be great.

          I’ve done a lot of stats, and they’ve never involved differentiation or integration. Most commonly used statistical analyses (e.g., ANOVA and Regression) are applications of the General Linear Model, which is not calculus. I’m not saying there isn’t fancy stuff I’m not familiar with that requires calculus, but it’s not needed for the vast majority of stats, where the underlying computations are best represented through operations on matrices (i.e., linear algebra).

      2. Yes and no.  There is quite a bit of integrating various distributions in statistics.  You won’t get very far through “Statistical Inference” by Casella and Berger without calculus.  Even some of the linear algebra will involve the Jacobian or Hessian matrix, so while that’s linear algebra, you have to know some calculus to get it.

  8. “But that amount of attention to education is perhaps too difficult for the lumbering, rigid American political system.”

    Before we even get to the political system you have to get past the “if it was good enough for me, it should be good enough for the current generation” attitude in, well, pretty much across the board in American culture.  Parents, teachers, administrators, college researchers – I’ve seen it at every level.  The inherent small-c conservatism in most people manifests itself VERY visibly as soon as anyone starts talking about “meddling” with the school curriculum and changing the things that little Johnny and little Janie are going to be learning to something different than what was taught in the past.

  9. I think I actually did learn a lot of stats before calculus, but I guess I see it as a Yin-yang type of thing where one isn’t much good without the other.  For instance it takes a lot of math to wrestle variables down to the point where basic stats work reliably, correlations are linear, sampling is uniform, etc.  

  10. Before anybody blows this up into “most science is wrong,” the authors state that even among the cases that had statistical errors, 1/3 of those had large enough effect sizes that using the proper test would probably not have any effect on the conclusions.

    They also point out that the other 2/3 were counted as having a statistical error anywhere in the paper, not nescessarily in the key experiment. Most neuroscience papers have 1 or 2 key experiments, plus lots of minor experiments backing up or double-checking the main experiment. If one of these minor experiments is wrong, it doesn’t mean the whole paper is wrong too.

    Still, this is an important meta-analysis and a strong arguement for better attentition to stats in both education and peer review.

  11. To identify the need for or execute a difference-in-differences (DID) is not tricky. It is a simple test. DID is econometric in origin so perhaps people are confused in that it may be hard to visualize the anova but if you look at it in a regression it is rather trivial. But DID has its own host of problems and should be avoided if possible, and I wonder if these researchers even needed to look at DID.

    The Hypo-PI would be wrong in saying “mutant cells and normal cells respond to the chemical differently,” but I wonder if that is relevant to most of the experiements or if they are just using poor wording to make their research more compelling. If we think about it in terms of postion and distance, knowing if a treatment moves a subject to a knew position is usually what we are concerned about not if it moves two different types the same distance. If it moves normal mice from a 1 to a 4 and abnormal mice from a 3 to a 4, where they end up is usually what we are concerned about, not if it moved each the same amount.

    At the very least it is a calling for more integration of statisticians in the training and collaboration of experiments. As the world of analytics continues to explode and the statistical packages and database that where at once unimaginable become available to every researcher in every dept, the more important it becomes to have a clear statistical understanding of what you are doing. If you look through statisticians and econometricians C.V.s, you will see them collaborating on projects far beyond what you would think would be working in and in multiple fields, but they are able to bring a strong understanding of what the tools are actually doing why the other researchers understand conceptually what is happening.

  12. I think actually that the experiment described here would be more like test chemical versus solvent control for two different cell lines, with two (nested?) null hypotheses  – one being that the test chemical doesn’t do something, and the other that it doesn’t do something better in one cell line than the other. Fisher’s test could be corrected for multiple hypothesis testing using the Bonferroni correction (which is conservative). Pretty standard practice in toxicology, but perhaps neuroscientists are different?

Comments are closed.