Statistics Done Wrong: a guide to spotting and avoiding stats errors

Alex Reinhart's Statistics Done Wrong: The woefully complete guide is an important reference guide, right up there with classics like How to Lie With Statistics. The author has kindly published the whole text free online under a CC-BY license, with an index. It's intended for people with no stats background and is extremely readable and well-presented. The author says he's working on a new edition with new material on statistical modelling.

Surveys of statistically significant results reported in medical and psychological trials suggest that many p values are wrong, and some statistically insignificant results are actually significant when computed correctly.25, 2 Other reviews find examples of misclassified data, erroneous duplication of data, inclusion of the wrong dataset entirely, and other mixups, all concealed by papers which did not describe their analysis in enough detail for the errors to be easily noticed.1, 26

Sunshine is the best disinfectant, and many scientists have called for experimental data to be made available through the Internet. In some fields, this is now commonplace: there exist gene sequencing databases, protein structure databanks, astronomical observation databases, and earth observation collections containing the contributions of thousands of scientists. Many other fields, however, can’t share their data due to impracticality (particle physics data can include many terabytes of information), privacy issues (in medical trials), a lack of funding or technological support, or just a desire to keep proprietary control of the data and all the discoveries which result from it. And even if the data were all available, would anyone analyze it all to spot errors?

Similarly, scientists in some fields have pushed towards making their statistical analyses available through clever technological tools. A tool called Sweave, for instance, makes it easy to embed statistical analyses performed using the popular R programming language inside papers written in LaTeX, the standard for scientific and mathematical publications. The result looks just like any scientific paper, but another scientist reading the paper and curious about its methods can download the source code, which shows exactly how all the numbers were calculated. But would scientists avail themselves of the opportunity? Nobody gets scientific glory by checking code for typos.

Statistics Done Wrong: The woefully complete guide

(Image: XKCD)