The work of the world's leading nutrition researchers appears to be riddled with statistical errors

Brian Wansink is one of the most-cited nutrition researchers in the world; 30,000 US schools use his advice to design their lunch programs, drawing on studies he's done that show that kids eat more carrots when they're called "X-ray vision carrots" and that putting out fruit bowls improves eating habits, and that smaller plates reduce portion sizes.

But now virtually all of Wansink's research conclusions have been called into question, thanks to a blog-post (since deleted) in which he seemed to reveal that he routinely engaged in statistical misconduct, without realizing that this was problematic in any way.

In the post (archive.org cache), Wansink tells the story of how he ran an expensive study that didn't prove what he was hoping to prove, so he tasked a grad student to go fishing in the data-set for other hypotheses that might be provable given the data in it.

This is called "p-hacking" and it is a cardinal sin in statistics, one that often crops up in nutrition studies. P-hackers start with data and look for hypotheses, and often end up turning noise in the data into "proof" of some esoteric phenomenon.

If all of Wansink's statistical analysis is as bad as this post implied it might be, it would call into question Wansink's research going back years — and worse, all the research that built on it, which is an appreciable slice of the body of nutrition studies.

Wansink won't release his data for an independent audit because he claims it would potentially reveal the identities of subjects who participated on the promise of anonymity (this sounds pretty plausible, given the ease with which subjects can be re-identified from anonymized data-sets).

So researchers have been combing through his conclusions, and trying to see if they are consistent with any plausible data-set, given the analysis that Wansink claims to have performed. They've found some pretty sketchy irregularities, too: for example in three different studies, Wansink received exactly 770 postal responses to a survey, despite each study having vastly different numbers of participants (1002, 1600, 2000) and different incentives for participation ($6, $5, $3).

One researcher who's been combing through the papers says, "It's just article after article where nothing makes any sense."

Wansink's retracted one paper, and has committed — along with Cornell, his host institution — to re-analyzing the others.

In an excellent Ars Technica piece, Cathleen O'Grady delves deep into the scandal, and also explains why peer review didn't catch this stuff sooner (tl;dr: peer review is about evaluating papers, not re-running experiments to see if the underlying data seems plausible).

Could the kinds of statistical tools used to scrutinize Wansink's papers improve standards in peer review? "I hope so," says van der Zee. An automated statistics-checking software package called statcheck found terrifying numbers of errors of a specific type in the published literature. But not every problem will be recognized by this software. The tools used by Brown and his colleagues pick up different kinds of errors, and they can't be automated. They require actually reading through the paper and puzzling through the methods.

"You've got to read the article pretty closely," Brown says. "It is quite suitable for peer review though," he adds, "because the peer reviewer arguably ought to be examining the article at this level of detail."

Alongside other proposals to improve the transparency of data and replicability of results, tools like statcheck could be a powerful addition to the peer review process—if reviewers can be persuaded to use them. "We need more awareness," says van der Zee, "and cases like these can help to raise awareness."

"Mindless Eating," or how to send an entire life of research into question
[Cathleen O'Grady/Ars Technica]

(Image: woodleywonderworks, CC-BY)