Death and the Mainframe: How data analysis can help document human rights atrocities

By Maggie Koerth-Baker at 6:00 am Tue, Jun 25, 2013

Between 1980 and 2000, a complicated war raged in Peru, pitting the country’s government against at least two political guerilla organizations, and forcing average people to band together into armed self-defense committees. The aftermath was a mess of death and confusion, where nobody knew exactly how many people had been murdered, how many had simply vanished, or who was to blame.

“The numbers had floated around between 20,000 and 30,000 people killed and disappeared,” says Daniel Manrique-Vallier. “But nobody knew what the composition was. Non-governmental organizations were estimating that 90% of the deaths were the responsibility of state agents.”

Manrique-Vallier, a post-doc in the Duke University department of statistical science, was part of a team that researched the deaths for Peru’s Truth and Reconciliation Commission. Their results were completely different from those early estimates. Published in 2003, the final report presented evidence for nearly 70,000 deaths, 30% of which could be attributed to the Peruvian government.

How do you find 40,000 extra dead bodies? How do you even start to determine which groups killed which people at a time when everybody with a gun seemed to be shooting civilians? The answers lie in statistics, data analysis, and an ongoing effort to use math to cut through the fog of war.

Violent conflicts don’t usually leave behind the kind of neat, detailed records that make it easy to seek justice for victims and prosecute killers. Because of that, witness testimony has always been an important part of understanding what happened, pretty much since we started caring about how wars affected human rights.

But there aren’t always surviving witnesses. Also, in the aftermath of war, stories conflict with one another, because different groups of people have a vested interest in seeing events in different ways. Wars often happen in places that hadn't even had accurate census data before people started killing one another. And some stories might not even get widely told, to begin with, because the people telling them are seen to be less important.

Case in point: if you only looked at newspaper reports of the Peruvian conflict, you might think that the 1990s were the most violent years. In reality, though, most of the killings and violence happened in the 1980s. The discrepancy comes from the fact that the war was being fought in rural areas at that time. When it shifted to cities in the 1990s, more journalists (who also lived in the cities) started to pay more attention.

That's really where statistics comes in handy, says Jay Aronson, director of Carnegie Mellon’s Center for Human Rights Science. The inherent confusion of war can be used as an excuse — a reason for governments to walk away from history without seeking justice.

Guatemala is a great example of this. One of BoingBoing's other editors, Xeni Jardin, extensively covered the recent trial of former Guatemalan leader Gen. Jose Efrain Rios Montt earlier this year. He was charged with authorizing the genocide of Ixil Mayans. His defenders claimed the Mayans were just casualties in a legitimate war. But Patrick Ball, a statistician who works for the Human Rights Data Analysis Group, was able to use statistical analysis to prove otherwise. As Xeni and Miles O'Brien reported in a long piece on the science of the Rios Montt trial for PBS Newshour, 5.5% of all the indigenous people of Guatemala were killed in just 16 months, April 1982 to July of 1983. During the same time period, only .7% of the non-indigenous population was killed.

Data analysis allows us to get rid of the excuse that war is just messy, and sometimes people die. It’s also part of a larger trend. Over the last 25-30 years, Aronson says, civilian casualties have become more important to how we think about the aftermath of war. In order to really understand what happens to civilians you need both the detail provided by witnesses, and a sense of the big picture that comes from statistics.

A Two-Party System

So how do you turn conflicting narratives and swiss-cheese information into estimates that you can feel remotely confident about? The primary method used by scientists like Patrick Ball and Daniel Manrique-Vallier is a system called “capture-recapture” or multiple systems estimation.

The best way to wrap your head around this methodology is to start by imagining something completely unrelated to war and genocide — think of a house party, instead, says Kristian Lum, assistant research professor at the Virginia Biosystems Research Institute.

“Say there’s no guest list, but you and I want to know how many people are there,” she says. “To figure it out, we’ll start by both going into the party and collecting the names of everyone we talk to. Then we’ll meet back outside and compare the lists.”

If my list has 10 names and Lum’s list has the exact same 10 names, we can start to assume that the party is pretty small. If there were a lot more than 10 people in the house, it’s unlikely that she and I would have come up with the exact same list. But, if my list of 10 people and Lum’s list of 10 people only have a single person in common, then it’s likely that party is more on the scale of kegger than an intimate gathering of friends. The less overlap between the lists, the bigger the total number is likely to be.

In the real (and significantly less festive) world, researchers gather up all the different lists of war dead they can get their hands on. Sometimes, they’ll even do interviews with witnesses and put together a list of their own. Then, they start comparing the names — figuring out how many people are counted on multiple lists and how many are unique to a particular list.

The analysis becomes a lot more complicated than our simple party-based example, but it gives you the basic gist. It should also help you understand that the numbers produced by these kinds of analyses aren’t meant to be perfect tallies. In Peru, for example, the Truth and Reconciliation Commission Report didn’t say that 70,000 people had died. Instead, researchers could say with 95% confidence that the number of dead fell into a range between 61,007 and 77,552. Earlier this month, when I wrote about the Human Rights Data Analysis Group’s recent report on deaths in the Syrian Civil War, I made sure to clarify that their number — 92,901 — should be thought of as the minimum number of people that have died. Not an exact count.

Pattern Recognition

That brings us to a key idea of this kind of analysis — patterns matter more than numbers. Patrick Ball of the HRDAG originally brought this idea up with me, and when I mentioned it in my piece on the Syrian deaths, I had taken it to mean that the total count of dead maybe mattered a little less than who was dying, in which parts of the region, and who they were being killed by.

In some ways, that’s true. Total numbers matter, but it’s the who, what, when, where, and why details that really help you understand if you’re dealing with, say, a standard conflict, or a more one-sided genocide. Those patterns affect the way we seek justice and how we frame the conflict for future generations. It's why witness testimony is so valuable.

But that difference between numbers and patterns refers to something else, as well — something much more fundamental to the whole concept of applying math to human rights abuses.

It’s an admission and a caveat: Any individual list of war dead is probably incorrect. Any attempt to take the numbers in an individual list and use them to understand what’s going on is likely to be misleading. In Peru, the newspapers focused on the people who died in cities. Studying those reports produced a pattern — but it was the wrong one. Likewise, when Daniel Manrique-Vallier and his colleagues interviewed 17,000 witnesses, they ended up collecting a list of 20,000 names of people who had been killed or disappeared. But it wasn’t until they put that list into context with lists from the NGOs, the Red Cross, and other organizations that a real pattern emerged in the data — a pattern which told them that far, far more people had died than anybody’s back-of-the-envelope estimates had supposed. It’s not the numbers from one list that mattered. It was the patterns that emerged when you started to compare lots of lists.

In fact, that was the point of a blog post Ball wrote last week, explaining why he couldn’t help reporters put together nifty “How the Conflict In Syria Has Changed Over Time” infographics by just handing them the raw data, i.e., the multiple lists of war dead that the HRDAG’s total count was based on.

If you were to use the raw data for the analysis, you would be assuming that the count for each day represents an equal, constant proportion of the total deaths on that day.

For example, imagine that you observe 100 killings on Thursday, 120 on Friday, and 80 on Saturday. Is there a peak on Friday? Maybe, maybe not. The real question is: how many killings really happened on Thursday? Let’s say there were 150, so you observed 100/150 = 0.66 on Thursday. Did you also observe 0.66 of the total killings on Friday? On Saturday? Again, maybe. Or maybe not. Maybe on Friday your team worked really hard and observed 0.8 of the total killings: you observed 120 and there were really 150 (the same as Thursday). On Saturday, however, some of your team stayed home with their families, so you really only observed 0.5 of the total killings: you observed 80, but there were really 160. The true pattern of killings is therefore that the numbers were equal on Thursday and Friday -- and Saturday was worse. The true pattern could be very different from the observed pattern.

This is what I mean when I say the patterns are thing that really matters. They’re what allows us to see how many people actually died and who the dead were. They’re the things that help us distinguish between the messy aftermath of a conflict and closer-to-accurate view of what actually happened. “Our capacity to monitor is so much smaller than the universe we are trying to monitor,” Ball told me in an interview. Patterns are the things that help us see the bigger picture.

Read more:
Patrick Ball’s full post on why patterns matter
BoingBong's coverage of the trial of Gen. Jose Efrain Rios Montt
• The piece on the science of genocide in Guatemala that Xeni and Miles O'Brien produced for PBS Newshour
• BoingBoing's earlier story on the death count from the Syrian Civil War
The Peruvian Truth and Reconciliation Commission report
• A paper on the application of multiple systems estimation in human rights research
• An new book, edited by Jay Aronson, Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict

Image: War, a Creative Commons Attribution No-Derivative-Works (2.0) image from Moyan Brenn

Published 6:00 am Tue, Jun 25, 2013

About the Author

Maggie Koerth-Baker is the science editor at BoingBoing.net. From August 2014-May 2015, she will be a Nieman-Berkman Fellow at Harvard University. You can follow Maggie's adventures in the Ivory Tower by subscribing to The Fellowship of Three Things newsletter.

More at Boing Boing

The 1944 science fiction story that predicted the atomic bomb

In 1944, fully a year before the first successful nuclear test, Astounding Science Fiction magazine published a remarkably detailed description of an atomic bomb in a story called Deadline. The story, by the otherwise undistinguished author Cleve Cartmill, sent military intelligence racing to discover the source of his information — and his motives for publishing it.

Tikis of Bora Bora

Mark Frauenfelder returns from vacation with a substantially-enlarged collection of tiki photographs.

6 Responses to “Death and the Mainframe: How data analysis can help document human rights atrocities”

  1. RickB says:

    It would also help when writing about it people did not fail to mention the role of the US govt in training, funding and supporting torture and death squads in these atrocities. Maybe have a talk with http://www.soaw.org Justice is never free of political bias sadly, so talking about it must reflect that.

  2. I think the original header image of Vasily Vereshchagin’s 1871 “The Apotheosis of War” was more affective. see http://en.wikipedia.org/wiki/File:Apotheosis.jpg

  3. openfly says:

    On the flip side, mass data analysis was used for one of the first times ever to provide for such an effective campaign of genocide in Nazi Germany.  IBM’s Hollerith machines were used to crunch census data on the German population and identify Jewish citizens.  Later they were even used in the infamous Auschwitz concentration camp.  The iconic serialized tattoos of prisoners there were actually tabulation numbers for the machines.

    No knowledge, no innovation is not without it’s risk of exploitation.

    • momus_98 says:

       I tire of this argument; the use of IBM’s equipment and the company’s lack of control over the war years is hugely complex. To your broader implication of war exploitation:

      Bayer, BMW, Siemens, Mercedes-Benz, among others.

      There are no innocent parties here.

      • openfly says:

        It was fairly effective at crunching census numbers…. having been designed to do just that for the US.

        Ignoring blame… it’s an interesting and early use of large scale data mining tactics for use by a genocidal regime.  Taking it as just that.  It’s interesting as hell.

  4. cybermales says:

     humans in war is difficult to determine the number of casualties, because the population of each country is never accurate.who has died of hard to see because there is always a lie in each battle, except from the beginning of human life has been fitted with a monitored server barcode world’s population ..

Leave a Reply