Math against crimes against humanity: Using rigorous statistics to prove genocide when the dead cannot speak for themselves

Patrick Ball and the Human Rights Data Analysis Group (HRDAG) (previously) use careful, rigorous statistical models to fill in the large blank spots left behind by acts of genocide, bringing their analysis to war crimes tribunals, truth and reconciliation proceedings, and other reckonings with gross human rights abuses.

Katia Savchuk's Pacific Standard profile of HRDAG delves into both the work and its effects, showing how a small group of smart mathematicians can do what governments won't do and the press generally can't do (at least, not well): count the dead and tell their stories, in the face of concerted efforts to wipe away both the dead and the facts of their killings.

One of the most moving stories from HRDAG's work is their 2018 project to uncover hidden mass graves using machine learning, giving local activists the ammunition they needed to demand investigations from their governments.

At least 37,000 people have vanished in Mexico since 2007, according to a Mexican government database, after officials there declared a "war on drugs." Officials have discovered the remains of some 2,000 people in more than 1,000 clandestine graves. But where are the missing bodies, the unmapped graves?

"It's hard to remember that the blank spots in the map are because you just don't have any data," Ball says. "Why don't we have any reports from those places? The answer is the people who do the reporting risk getting killed."

Last year, Ball decided to fill in the blanks. With colleagues from Data Cívica, a non-profit, and the Human Rights Program at Ibero-American University, both in Mexico City, he developed a machine-learning model to predict the location of hidden graves. They started with data on existing graves from press reports and local prosecutors, plus a host of seemingly unrelated variables about the country's municipalities, such as miles of paved highway and proportion of teenagers attending school. Applying an algorithm called "random forest," they determined the places most likely to contain secret burial sites. Based on data from 2016, they identified 13 municipalities that had at least a 65 percent chance of harboring graves, but where neither the press nor local officials had reported any. These included Apatzingán in the state of Michoacán, González and Altamira in Tamaulipas, and Pueblo Nuevo in Durango.

The locations weren't shocking—they were known centers of violence—but Ball says singling them out with technology created a powerful advocacy tool. "People know where the violence is, but nobody can say it openly," he says. "There's no witness you can kill to silence a machine-learning model."

Relatives of disappeared people in one border state have used the data to convince local prosecutors to investigate. His partners have presented the model to authorities leading Mexico's newly created National Search Commission and the team plans to publish updated findings early next year. Eventually, they hope to refine the model with more granular data, such as the type of terrain where bodies have been discovered or their distance from roads and rivers, so it can lead people closer to specific gravesites.

All the Dead We Cannot See [Katia Savchuk/Pacific Standard]