Short animations about how clicky health-claim headlines are often misleading

In these two excellent short animations, data science professor Jeffrey Leek of the Simply Statistics blog and Johns Hopkins Bloomberg School of Public Health, and his university colleague, postdoctoral research Lucy McGowan, explain how "in medicine, there’s often a disconnect between news headlines and the scientific research they cover."

Read the rest

Big Data's "theory-free" analysis is a statistical malpractice

One of the premises of Big Data is that it can be "theory free": rather than starting with a hypothesis ("men at buffets eat more when women are present," "more people will click this button if I move it here," etc) and then gathering data to validate your guess, you just gather a ton of data and look for patterns in it. Read the rest

Math against crimes against humanity: Using rigorous statistics to prove genocide when the dead cannot speak for themselves

Patrick Ball and the Human Rights Data Analysis Group (HRDAG) (previously) use careful, rigorous statistical models to fill in the large blank spots left behind by acts of genocide, bringing their analysis to war crimes tribunals, truth and reconciliation proceedings, and other reckonings with gross human rights abuses. Read the rest

Using statistics to estimate the true scope of the secret killings at the end of the Sri Lankan civil war

In the last three days of the Sri Lankan civil war, as thousands of people surrendered to government authorities, hundreds of people were put on buses driven by Army officers. Many were never seen again. Read the rest

High school class's electoral predictions model is a model for electoral predictions

The students in David Stein's Political Statistics class at Montgomery Blair High School in Silver Spring, Maryland have built a statistical model for predicting the outcomes of the upcoming midterm elections: the model makes assumptions about voter turnout and the way that polling data will translate into votes in 2018. Read the rest

English and Welsh local governments use "terrorism" as the excuse to block publication of commercial vacancies

Gavin Chait is an "economist, engineer, data scientist and author" who created a website called Pikhaya where UK entrepreneurs can get lists of vacant commercial properties, their advertised rents, and the history of the businesses that had previously been located in those spaces -- whether they thrived, grew and moved on, or went bust (maybe because they had a terrible location). Read the rest

A/B testing tools have created a golden age of shitty statistical practices in business

A team of researchers examined 2,101 commercial experiments facilitated by A/B splitting tools like Google Optimize, Mixpanel, Monetate and Optimizely and used regression analysis to detect whether p-hacking (previously), a statistical cheating technique that makes it look like you've found a valid cause-and-effect relationship when you haven't, had taken place. Read the rest

How to assess The Federalist Papers' authorship with statistics

The Federalist Papers comprises of 85 articles written in the 1780s by founding fathers Alexander Hamilton, James Madison and John Jay. They wrote under a collective pseudonym, Publius, so keep their involvement secret. But who wrote what? There is much dispute. Let's try K-Means Clustering.

K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. For our example we’ll have 68 observations (papers) into 2 clusters (2 authors, Madison and Hamilton).

Once the data has been converted into workable features, we can fit them onto a 2-cluster model. This is unsupervised - we are effectively just pouring in our (ideally) significant data and telling it that there are two distinct sets within it, and to try and extricate them.

Spoiler: most of them were by Hamilton. Read the rest

Puerto Rico to dismantle its statistics agency in the midst of radical shock doctrine project

The Puerto Rican senate has approved Governor Ricardo Rosselló's plan to dismantle the Puerto Rico Institute of Statistics (PRIS), handing its functions private contractors paid by the Department of Economic Development and Commerce to manage. Read the rest

A critical statistics education that fits on a postcard

Economist and maths communicator Tim Harford (previously) presents a riff on Harold Pollack's aphorism that "The best financial advice for most people would fit on an index card," and comes up with a complete set of rules for statistical literacy that fits on a postcard. Read the rest

An interesting way in which money is totally broken

Calculating inflation, earning power, social progress, equality and inequality -- they all depend on being able to compare what used to be happening in our economy to what's happening now, and the way we do that is with money. Read the rest

Statistical proof that voter ID laws are racially discriminatory

In ADGN: An Algorithm for Record Linkage Using Address, Date of Birth, Gender, and Name, newly published in Statistics and Public Policy, a pair of researchers from Harvard and Tufts build a statistical model to analyze the impact of the voter ID laws passed in Republican-controlled states as part of a wider voter suppression project that was explicitly aimed at suppressing the votes of racialised people, historically likely to vote Democrat. Read the rest

Snakes and Ladders can be analyzed by converting it to a Markov Chain

University of Washington data scientist Jake Vanderplas found himself trapped in an interminable series of Snakes and Ladders (AKA Chutes and Ladders) with his four-year-old and found himself thinking of how he could write a Python program to simulate and solve the game. Read the rest

A quantitative analysis of doxing: who gets doxed, and how can we detect doxing automatically?

A group of NYU and University of Illinois at Chicago computer scientists have presented a paper at the 2017 ACM Internet Measurement Conference in London presenting their findings in a large-scale study of online doxings, with statistics on who gets doxed (the largest cohort being Americann, male, gamers, and in their early 20s), why they get doxed ("revenge" and "justice") and whether software can detect doxing automatically, so that human moderators can take down doxing posts quickly. Read the rest

Latest Federal Reserve figures show widening wealth inequality, and it's much worse if you're not white

The Federal Reserve's just-published 2016 Survey of Consumer Finances reveals that income inequality is rising in the USA, with the top decile now controlling 77.1% of the nation's wealth; wealth that is increasingly retained through intergenerational bonds, meaning that wealth is apportioned by accident of birth rather than merit; and (unsurprisingly, given the foregoing), the browner you are, the less you have. Read the rest

A (flawed) troll-detection tool maps America's most and least toxic places

The Perspective API (previously) is a tool from Google spinoff Jigsaw (previously) that automatically rates comments for their "toxicity" -- a fraught business that catches a lot of dolphins in its tuna net. Read the rest

Insights from statistical analysis of great (and not-great) literature

Ben Blatt's Nabokov's Favorite Word Is Mauve: What the Numbers Reveal About the Classics, Bestsellers, and Our Own Writing takes advantage of the fact that so much literature has been digitized, allowing him to run statistical analyses on writers, old and new, and make both fun and meaningful inferences about the empirical nature of writing. Read the rest

More posts