White House plan to use data to shrink prison populations could be a racist dumpster fire

The US imprisons more people than any other country in history, both as a total number and as a proportion of its population; a White House data-mining effort proposes to set free prisoners who are "low risk," which is something we can all get behind.

There's a problem with this kind of data-driven solution, though. It starts from the presumption that an appreciable fraction of the prisoners are a risk to the public and need to stay in prison.

But consider an alternate hypothesis: most of the people in prison are not a risk to the public and have been unjustly imprisoned. This isn't a crazy possibility: given that all the other rich, developed nations imprison far fewer people without having terrible crime in the streets, alternate explanations are less plausible, like "those countries are lying about their crime statistics," or "something special about America turns people into criminals."

If most of the people in US prisons are unjustly imprisoned and present no risk to the public, then any "data-driven" guess about whom to release would bear fruit. You could literally just pick 40% of the prison population at random and turn them out without seeing any increase in crime.

The validation for the White House's proposal is a pilot program in Mecklenburg County, North Carolina that used data-mining to release 40% of its prisoners without any crime spikes.

Unless we know how the predictive model worked (and in particular, how it was trained), we can't know if the outcome was because most prisoners shouldn't be behind bars, or because the model was picking low-risk offenders correctly.

Imagine that a large majority of prisoners are "low-risk," but that some of them come from populations that are overpoliced — subjected to stop-and-frisk searches, rousted, raided, and more likely to receive custodial sentences than other offenders from different groups accused of the same crimes (imagine too that they were pressured into entering guilty pleas by a system that allowed prosecutors to ask for incredibly harsh sentences for anyone who didn't voluntarily plead guilty).

Black people in America fit that description of being overpoliced and oversentenced. Being overpoliced and oversentenced means that you are more likely to be arrested than the average person, even if you are no more criminal than the average person. If you train a machine learning algorithm on a data-set that includes people who are overpoliced, and ask it who is most likely to reoffend, it will tell you that black people reoffend more than white people. But since the sampling is biased, all you're learning is that black people are more likely to be arrested, charged and sentenced than white people (possibly because they've been blackmailed into false guilty pleas by US prosecutors, who convict more than 97% of the people they indict).

Without knowing the base rate of actual risky offenders imprisoned in the US, and with the international evidence that the US has a lot more prisoners than it should, there's a real concern that you could have a racially biased algorithm that let white people go preferentially and appeared, to all external indicators, to be working perfectly.

There is precedent for this: machine learning for sentencing guidelines is a racist dumpster fire that locks black people up while letting white offenders go free.

Data-driven, empirical policing, sentencing and incarceration are a good idea, but without a sound statistical basis, it's just racist facewash, a way to apply a veneer of empiricism to systematic bias.

Some tech companies are donating their existing tools to the member cities and states. For instance, RapidSOS, a company that allows people to submit their exact location data to emergency personnel, is offering its product to five cities for free for the next 10 years. Several research institutions like New York University and the University of Chicago are also partnering with cities and states to research their data strategies.

In a time when Republicans and Democrats can't seem to agree on anything, prison reform has become an unlikely unifier. Recently, House speaker Paul Ryan has become an outspoken advocate for sentencing reform. That type of across-the-aisle support could help these data efforts spread more quickly.

Already, among the seven states that signed on to the Data-Driven Justice Initiative, three have Republican governors. As part of the commitment, they promise to merge criminal justice and health system data to identify people who are most at risk, create new protocols for first responders dealing with mental health issues, and inform pre-trial release decisions.

The White House Is on a Mission to Shrink US Prisons With Data [Issie Lapowsky/Wired]