For 15 years, I've been a faculty member in the Ed.D. program at Nova Southeastern University where the majority of my doctoral students employ quantitative methods in their education research. Despite the fact that students take a mandatory methods class and get templates providing a rough statistical framework, they experience great confusion when it comes to designing their methodology and analyzing their data.
It's not just the students: despite my own background in mathematics (I teach linear and abstract algebra), I sometimes find myself uncertain about advising my students about their data analysis and also in conflict with some colleagues about what counts as being statistically valid. Typically, I turn to statistical textbooks and other colleagues for advice.
An article in the April 16, 2015 edition of Scientific American boldly claimed that research psychologists are wringing their hands over the inadequacy of the statistical tools they have been using. It seems that the use of p values as gold standard tests for significance has gone into disrepute as a consequence of over-reliance and inadequacy in determining the quality of the results. This is where Alex Reinhart comes in.
Reinhart is a physicist turned statistician who has set out to write a book whose aim is to improve the quality of statistical education and understanding that researchers need to have. Statistics Done Wrong is not a textbook. It is a highly informed discussion of the frequent inadequacy of published statistical results and confronts the sacred cow: the p value. Here is what he has to say on page 2.
Since the 1980s, researchers have described numerous statistical fallacies and misconceptions in the popular peer-reviewed scientific literature and have found that many scientific papers — perhaps more than half — fall prey to these errors. Inadequate statistical power renders many studies incapable of finding what they're looking for, multiple comparisons and misinterpreted p values cause numerous false positives, flexible data analysis makes it easy to find a correlation where none exists, and inappropriate model choices bias important results. Most errors go undetected by peer reviewers and editors, who often have no specific statistical training, because few journals employ statisticians to review submissions and few papers give sufficient statistical detail to be accurately evaluated.
Astonishing to my eyes was his conclusion that
The methodological complexity of modern research means that scientists without extensive statistical training may not be able to understand most published research in their fields.
Reinhart advises users of statistics to replace point estimates (p values) with confidence intervals (estimates of uncertainty). He discusses statistical power, (a way of determining the degree of confidence associated with statistical tests using the null hypothesis). He discusses and illustrates with clear and uncomplicated examples such things as the effects of sample size and reasonable estimates of bias (suggestive of the Bayesian approach).
He also covers the pitfalls associated with underpowered and overpowered statistical testing, and what contextual cues you can use to detect them.
The book is full of mind-arresting statistics, like, "fewer than 3% of articles" in prestigious journals Science and Nature "calculate statistical power before starting their study." Imagine what this means for my doctoral students.
This slim book, only 129 pages long, is very forceful in outlining and documenting the deficiencies in quantitative science research in the social and medical sciences. Toward the end, he addresses research scientists directly and advises them of how they can go about ameliorating the situation—not a simple task.
This is a useful, reasonably detailed guide for those who consume research, produce research, or who are interested in understanding the limits of current research. Crucially, it provides precise and unequivocal guidance advocated to instructors of statistics.
They say that in an argument you're entitled to your own opinions, but not your own facts. Statistics validity is a difficult nexus of opinion and fact, and this is a vital contribution to the argument.
Statistics Done Wrong: The Woefully Complete Guide [Alex Reinhart/No Starch Press]
Gord Doctorow
(Image: Spurious Correlations)