Psychology's reproducibility crisis: why statisticians are publicly calling out social scientists

Princeton University psych prof Susan Fiske published an open letter denouncing the practice of using social media to call out statistical errors in psychology research, describing the people who do this as "terrorists" and arguing that this was toxic because of the structure of social science scholarship, having an outsized effect on careers.

Andrew Gelman, a statistician and political scientist at Columbia and Harvard has written a long and important rebuttal to Fiske, addressing the reproducibility crisis in psychology research, which went from a decades-long simmer to a rolling boil in the past few years, as statisticians have shown that many of the canonical, most-cited psych studies were founded on gross statistical errors, and with other social scientists finding that many of these studies could not be replicated with statistical rigor. The entire foundation of modern psychology is in flux.

Gelman's rebuttal to Fiske is really the model of how adversarial peer review can work: Gelman thoroughly and publicly disagrees with Fiske. He cites work to justify his conclusions, and provides a space where those conclusions can be debated (168 comments and counting). In short, he does everything that Fiske disagrees with, in the process of showing that Fiske's hypothesis is wrong.

On the way, Gelman makes a convincing case that the reproducibility crisis is very serious, and requires a top-to-bottom reexamination of psychological scholarship and the clinical practice and wider view of human nature that it informs. If you're only going to read one long article today, make it this one.

The short story is that Cuddy, Norton, and Fiske made a bunch of data errors—which is too bad, but such things happen—and then when the errors were pointed out to them, they refused to reconsider anything. Their substantive theory is so open-ended that it can explain just about any result, any interaction in any direction.

And that's why the authors' claim that fixing the errors "does not change the conclusion of the paper" is both ridiculous and all too true. It's ridiculous because one of the key claims is entirely based on a statistically significant p-value that is no longer there. But the claim is true because the real "conclusion of the paper" doesn't depend on any of its details—all that matters is that there's something, somewhere, that has p less than .05, because that's enough to make publishable, promotable claims about "the pervasiveness and persistence of the elderly stereotype" or whatever else they want to publish that day.

When the authors protest that none of the errors really matter, it makes you realize that, in these projects, the data hardly matter at all.

Why do I go into all this detail? Is it simply mudslinging? Fiske attacks science reformers, so science reformers slam Fiske? No, that's not the point. The issue is not Fiske's data processing errors or her poor judgment as journal editor; rather, what's relevant here is that she's working within a dead paradigm. A paradigm that should've been dead back in the 1960s when Meehl was writing on all this, but which in the wake of Simonsohn, Button et al., Nosek et al., is certainly dead today. It's the paradigm of the open-ended theory, of publication in top journals and promotion in the popular and business press, based on "p less than .05" results obtained using abundant researcher degrees of freedom. It's the paradigm of the theory that in the words of sociologist Jeremy Freese, is "more vampirical than empirical—unable to be killed by mere data." It's the paradigm followed by Roy Baumeister and John Bargh, two prominent social psychologists who were on the wrong end of some replication failures and just can't handle it.

I'm not saying that none of Fiske's work would replicate or that most of it won't replicate or even that a third of it won't replicate. I have no idea; I've done no survey. I'm saying that the approach to research demonstrated by Fiske in her response to criticism of that work of hers is an style that, ten years ago, was standard in psychology but is not so much anymore. So again, her discomfort with the modern world is understandable.

What has happened down here is the winds have changed
[Andrew Gelman/Statistical Modeling, Causal Inference and Social Science]

(via Four Short Links)

(Image: Keep Out Experiment In Progress — LIGO Gravitational Waves, Steve Jurvetson, CC-BY)