English and Welsh local governments use "terrorism" as the excuse to block publication of commercial vacancies

Gavin Chait is an "economist, engineer, data scientist and author" who created a website called Pikhaya where UK entrepreneurs can get lists of vacant commercial properties, their advertised rents, and the history of the businesses that had previously been located in those spaces -- whether they thrived, grew and moved on, or went bust (maybe because they had a terrible location). Read the rest

A/B testing tools have created a golden age of shitty statistical practices in business

A team of researchers examined 2,101 commercial experiments facilitated by A/B splitting tools like Google Optimize, Mixpanel, Monetate and Optimizely and used regression analysis to detect whether p-hacking (previously), a statistical cheating technique that makes it look like you've found a valid cause-and-effect relationship when you haven't, had taken place. Read the rest

How to assess The Federalist Papers' authorship with statistics

The Federalist Papers comprises of 85 articles written in the 1780s by founding fathers Alexander Hamilton, James Madison and John Jay. They wrote under a collective pseudonym, Publius, so keep their involvement secret. But who wrote what? There is much dispute. Let's try K-Means Clustering.

K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. For our example we’ll have 68 observations (papers) into 2 clusters (2 authors, Madison and Hamilton).

Once the data has been converted into workable features, we can fit them onto a 2-cluster model. This is unsupervised - we are effectively just pouring in our (ideally) significant data and telling it that there are two distinct sets within it, and to try and extricate them.

Spoiler: most of them were by Hamilton. Read the rest

Puerto Rico to dismantle its statistics agency in the midst of radical shock doctrine project

The Puerto Rican senate has approved Governor Ricardo Rosselló's plan to dismantle the Puerto Rico Institute of Statistics (PRIS), handing its functions private contractors paid by the Department of Economic Development and Commerce to manage. Read the rest

A critical statistics education that fits on a postcard

Economist and maths communicator Tim Harford (previously) presents a riff on Harold Pollack's aphorism that "The best financial advice for most people would fit on an index card," and comes up with a complete set of rules for statistical literacy that fits on a postcard. Read the rest

An interesting way in which money is totally broken

Calculating inflation, earning power, social progress, equality and inequality -- they all depend on being able to compare what used to be happening in our economy to what's happening now, and the way we do that is with money. Read the rest

Statistical proof that voter ID laws are racially discriminatory

In ADGN: An Algorithm for Record Linkage Using Address, Date of Birth, Gender, and Name, newly published in Statistics and Public Policy, a pair of researchers from Harvard and Tufts build a statistical model to analyze the impact of the voter ID laws passed in Republican-controlled states as part of a wider voter suppression project that was explicitly aimed at suppressing the votes of racialised people, historically likely to vote Democrat. Read the rest

Snakes and Ladders can be analyzed by converting it to a Markov Chain

University of Washington data scientist Jake Vanderplas found himself trapped in an interminable series of Snakes and Ladders (AKA Chutes and Ladders) with his four-year-old and found himself thinking of how he could write a Python program to simulate and solve the game. Read the rest

A quantitative analysis of doxing: who gets doxed, and how can we detect doxing automatically?

A group of NYU and University of Illinois at Chicago computer scientists have presented a paper at the 2017 ACM Internet Measurement Conference in London presenting their findings in a large-scale study of online doxings, with statistics on who gets doxed (the largest cohort being Americann, male, gamers, and in their early 20s), why they get doxed ("revenge" and "justice") and whether software can detect doxing automatically, so that human moderators can take down doxing posts quickly. Read the rest

Latest Federal Reserve figures show widening wealth inequality, and it's much worse if you're not white

The Federal Reserve's just-published 2016 Survey of Consumer Finances reveals that income inequality is rising in the USA, with the top decile now controlling 77.1% of the nation's wealth; wealth that is increasingly retained through intergenerational bonds, meaning that wealth is apportioned by accident of birth rather than merit; and (unsurprisingly, given the foregoing), the browner you are, the less you have. Read the rest

A (flawed) troll-detection tool maps America's most and least toxic places

The Perspective API (previously) is a tool from Google spinoff Jigsaw (previously) that automatically rates comments for their "toxicity" -- a fraught business that catches a lot of dolphins in its tuna net. Read the rest

Insights from statistical analysis of great (and not-great) literature

Ben Blatt's Nabokov's Favorite Word Is Mauve: What the Numbers Reveal About the Classics, Bestsellers, and Our Own Writing takes advantage of the fact that so much literature has been digitized, allowing him to run statistical analyses on writers, old and new, and make both fun and meaningful inferences about the empirical nature of writing. Read the rest

The UK unemployment rate is at least three times the official rate

The UK -- like most countries -- excludes "inactive workers" (students, new parents, people who don't want a job) from its unemployment figures, but "inactive" is such a slippery concept that it can paper over huge cracks in the labor market. Read the rest

Boing Boing readers among web's most educated

Quantcast just released statistics that confirm what we've known all along: Boing Boing readers are super-smart, ranking among the top 25 "highest percentage of web traffic with higher education." Read the rest

Republicans are the primary beneficiaries of gerrymandering

As the Supreme Court makes ready to rule on the blatant gerrymandering in Wisconsin, the AP has conducted a study using "a new statistical method of calculating partisan advantage" to analyze "the outcomes of all 435 U.S. House races and about 4,700 state House and Assembly seats up for election last year" and report "four times as many states with Republican-skewed state House or Assembly districts than Democratic ones." Read the rest

A non-scientist's guide to reading scientific papers

Jennifer Raff -- a bioanthropologist and geneticist who researches and teaches at U Kansas and U Texas -- provides some excellent advice and context on how to read a scientific paper, from figuring out which papers and journals are worthy of your attention to understanding the paper in its wider context in the relevant field. Read the rest

Programmer pay and indent-style: tab-using coders earn less than space-using coders

David Robinson used the data from the 28,657 people who self-selected to take the Stack Overflow survey to investigate the relationship between programmer pay and the conventions of using either tabs or spaces to mark indents, and found a persistent, significant correlation between using spaces and bringing home higher pay. Read the rest

More posts