What a dead fish can teach you about neuroscience and statistics

The methodology is straightforward. You take your subject and slide them into an fMRI machine, a humongous sleek, white ring, like a donut designed by Apple. Then you show the subject images of people engaging in social activities — shopping, talking, eating dinner. You flash 48 different photos in front of your subject's eyes, and ask them to figure out what emotions the people in the photos were probably feeling. All in all, it's a pretty basic neuroscience/psychology experiment. With one catch. The "subject" is a mature Atlantic salmon.

And it is dead.

Functional magnetic resonance imaging (fMRI) is a powerful tool that allows us to capture incredible amounts of information about what happens in our brains. It's relatively new — neuroscientists began using fMRI in the early 1990s — and it produces colorful images that help bring numbers to life for the general public.

All of those things are strengths for fMRI. Unfortunately, they're also all weaknesses. New tools vastly expand our understanding of the human body ... but they also mean that we have to develop new standards so that different studies using the same tool can actually be compared to one another. Images of the human brain help make science more understandable ... but they can also be incredibly misleading when the public doesn't have a good idea of what the pictures show. Amassing vast quantities of information is great ... but it also makes it easy to end up with false positives — coincidences of chance that look like something a lot more important.

Enter the dead salmon.

In 2009, a team led by neuroscientist Craig Bennett and psychologist Abigail Baird ran an fMRI experiment using the salmon as their subject. Not only did they really put a dead (and frozen) fish into an fMRI machine, later analysis of their data actually produced evidence of brain activity — as if the dead fish were thinking. It wasn't, of course. But Bennett's and Baird's research — which recently won a 2012 IgNobel Award — was meant to show how easily scientists can mislead themselves and why well-done statistics are vital.

I got to speak with Bennett and Baird last week. In the interview, they talked about the study, how fMRI really works, and what scientists have to do to make sure they can trust their own results.

Maggie Koerth-Baker: Let's start with the basics. As a layperson, I see fMRI images in the news all the time, but I'm not really certain that I could tell you how fMRI works or what it's actually measuring. Can you explain?

Craig Bennett: We're not directly measuring activity in the brain. You'd need electrodes implanted in the brain itself for that. We're actually measuring the amount of magnetic disruption in the brain. We use a trick of how brain and body work. Oxygenated and deoxygenated blood have different magnetic properties.

Abigail Baird: If a brain region is doing a lot of work it's probobably going to be bringing in a lot of oxygen through increased blood flow. The premise is that if an area is working harder it will need more nutrients and oxygen and that will be delivered through the blood.

Using blood flow as measure of brain activity is reliable, but it's a very slow response. True brain activity happens when cells are communicating using neurotransmitters and electricity. Real, actual brain activity is measured with electrodes in the brain or someting like EEG that records electrical activity. The problem with doing that is that when you use EEG, you don't know exactly where the signal is coming from or what the signal means. fMRI presupposes that brain activity relies on oxygen but there's a 4-6 second delay because that's how long it takes for the call for more blood to go out. It's a slow response and in a way it's a sloppy response. We're assuming that there are more leftovers here in spot A then spot B, so there must be brain activity here and not there.

CB: The best description I've heard is that it's like coming up on thhe scene of a car accident and being able to tell what happened based on the skid marks. We have to try to interpret by the changes what was going on when the activity happened. It's a proxy.

MKB: So when we see those images with areas of the brain popping out in bright colors, that's not necessarily telling us that one part of the brain is active and the rest isn't.

AB: I'm so tired about hearing about "the brain lighting up". It makes it sound like you see lights in the head or something. That's not how the brain works. It suggests a fundamental misunderstanding of what fMRI results mean. Those beautiful colorful maps ... they're probability maps. They show the likelihood of activity happening in a given area, not proof of activity. According to our analysis, there's a higher likelihood of this region using more blood because we found more deoxygenated blood in this area. It's also correlational. Here's a time frame and the changes we'd expect, so we see which bits of brain correlate with that.

CB: We've had methods to look inside the brain of a living human for decades, and we've gotten quality science out of that method. What does fMRI add? The big thing is spatial location, you can say where in the brain activity is happening to a much greater degree. It's really mostly about that. But what that buys you is the ability to produce really pretty maps of the brain. You get a greyscale image with the colored spots that indicate what's significant. But that's not showing brain activity, it's showing a statistic. I drew a line in the sand and said these dots are the ones that crossed the line. It makes for drammatic and pretty presentation of data. If you have a page of jargon people will believe it at a certain level. But if you put a picture of the brain with active voxels [a three-dimensional pixel] people will believe it even more because a picture of the brain is next to it. We have a powerful tool and ability to create dramatic persuasive figures. And we can use it in improper ways.

MKB: So how do we know that the data we get from fMRIs is useful, at all? If it's just correlational, and doesn't really show you where activity is happening?

CB: This is why we have to do tightly controlled experiments. To do it right, you'll take two conditions, almost exactly matched except for one critical thing. Some of the studies I really like are visual studies. I could show you the same stimulus, say a flashing circle of light, but I'd change the position of it. Whether it's inn the top third or the bottom third of your field of vision. Just by changing the position and comparing each position to each other you can see which parts of the brain are sensitive to each spot. That's a narrow study and a really good control.

AB: More than a couple papers have been sesationalistic. There have been comparisons of Republican and Democratic brains. That's ridiculous and it's a misuse of fMRI. It's not a specific enough question.

MKB: Can you explain what you mean by a specific question here?

AB: In an fMRI study you have to stimulate the brain in some way. So what are you showing the brain in order to make distinction between Republicans and Democrats? Say it's pictures of people on welfare, and Democrats showed more activation in one area and Republicans in another. It doesn't actually tell you anything about Democrats and Republicans. Those results might tell you something about compassion. Or how we process compassion. But to say there are fundamental differences as a whole group between two groups of people, when there's so much variation within the group, it's just silly. I could get the same result ... find big differences ... with two groups of Democrats.

Remember, the brain doesn't just light up and those images are showing statistics, not all activity. If you see the same thing in several different studies, you can trust it more. But you should be suspect of one study of a handful of people, especially if the question wasn't specific enough and the researchers just went fishing to see what would happen. Also, what you're seeing is an average of the group, not each individual. You could have a group of 40 people and 39 out of the 40 show activity in one area, but that area might still get dropped from the final images because everybody didn't have it. So you need to consider the individuals, not just the group.

MKB: Let's get back to that dead salmon you worked with. If fMRI is measuring changes in blood flow — or changes in oxygenation which indicate a change in blood flow — why would you see any signal at all in the brain of a dead salmon?

CB: In almost any experiment, but especially with MRI and fMRI, it's a noisy measure. There's all kinds of noise that gets entered into the signal. It'll pick up your own heart beating. We once had a lightbulb going bad in the scanner suite and it was introducing specific signal in our data set. You have to get enough data ... run the experiment enough times ... to separate signal from noise.

We're looking for variation in the magnetic field. With the salmon, fat will do that. Fatty tissue has a magnetic signal, but some areas of fatty tissue are more dense, and some less, so you'll see a differential. The salmon's brain was more fatty and that created more inherent variability. But it was just noise. It wasn't due to any actual activity but just happened to match our study design. Now, that's unlikely. But it just happened to happen. It's possible to find a false positive like that.

AB: We also saw activity outside the body of the salmon. The magnet itself has noise. It will always have noise. And if the threshold is low enough you're going to get that noise pattern matching up with your hypothesis.

MKB: So, basically, the salmon is about statistics, right? Why do statistics matter so much? I think most people imagine scientists just taking down data and reporting what they observe. But it's more complicated than that.

AB: In most behavioral sciences and natural science, there's a certain cutoff level where we consider the things we've found significant or not. The gold standard is .01, less than a 1% chance that you're seeing something just by accident. Or a 99% chance that it's an actual difference. But, still, 1 out of 100 times you'd get that exact same result just by chance. We're also interested in data at the .5 level. Anything up to 10% we tend to call that a trend — something might be happening. That has held throughout history of psychology and neuroscience and it's pretty good. But we'd never had any tools that produced the magnitude of data that fMRI has. Instead of making comparisons between two groups of 40 people, you're making comparisons between 100,000 points in the brain and that .01 no longer says as much because you have so much more information to work with.

CB: Here's my analogy, if I give you a dart and say, "Try to hit the bullseye", you have some chance of hitting it. Your chance is not 0. But, depending on skill, you might hit more or less often. So you try the throw with one dart and hit on first throw, that's impressive. That's like finding a result. But if you only hit it once out of 100 tries, it's less impressive. In fMRI it's like having 60,000 darts you can throw. Some will hit the bullseye by chance and we need to try to correct for that. We tend to set a threshold and say anything over is legitimate and anything under is not. But what our team found is that in a surey of literature, between 25-40% of published papers were using an improper correction. You have a lot more chances of finding significance so you need to be more conservative of saying what is a legit result.

AB: So if you have a really specific hypothesis you can stick to the traditional numbers. But if you don't know what you're looking for and you just want to see "what lights up", then you're getting lots more chances to see things that could be just random. That's when you need to be more strict about what you consider real. And people aren't always as careful about that as they could be.

MKB: So you're saying that, right now, there's a pretty good chance that a lot of the research papers that use fMRI are showing results that are every bit as wrong as the results you got while studying a dead salmon?

CB: Up to 40% of papers published in 2008 didn't do proper correction, so are there incorrect results in literature? Absolutely. Even if we correct perfectly you'll probably have 5% incorrect. There will always be false positives. But as a field we need to do as good a job as possible to release the best results we can. What we're saying is that it's not good for you, your study, or the field as a whole to not correct hard enough.

• You can read Craig Bennett and Abigail Baird's full paper online at the Journal of Serendipitous and Unexpected Results

Read a story Alexis Madrigal wrote for Wired about this study in 2009

Read blogger and neuroscientist Scicurious' article on the dead salmon study, published after the IgNobel announcement.

Image: Christmas Salmon, a Creative Commons Attribution (2.0) image from toolmantim's photostream