In the last three days of the Sri Lankan civil war, as thousands of people surrendered to government authorities, hundreds of people were put on buses driven by Army officers. Many were never seen again.
The report begins with seven lists of people reported as disappeared. The lists were compiled by the UN, families of the disappeared, and Sri Lankan officials, among others. It then uses a method called Multiple Systems Estimation to estimate the "dark figure," that is, the people who were disappeared but have not been reported. From 443 people reported on one or more list, we estimate a total population of 503 victims in a two-tailed 95% credible interval from 468 to 554 (it's Bayesian!); see the figure for the posterior distribution of the estimate).
The report contains several references to the academic literature in mathematical statistics that explains how this method works and what the assumptions are. In short, the intuition is that with multiple, independent lists, the more often they document the same people, the fewer people remain undocumented. More collisions among the lists imply that the lists are compressed by a universe only slightly larger than the union of the lists, while fewer collisions imply that the lists are drifting apart in a space much larger than the lists themselves.
I've used this method in a number of previous projects. Recently, with colleagues from the Colombian human rights NGO Dejusticia, I published an estimate of the number of social movement leaders being killed in Colombia (summary in English, full report in Spanish). In another application, I estimated the total mortality of indigenous and non-indigenous people to show that in three key counties in Guatemala, indigenous people were eight times more likely to being killed by the Army in Guatemala, relative to their non-indigenous neighbors. I presented the finding as expert testimony in the trial for genocide of the former dictator, Gen. José Efrain Rios Montt (previously covered by BB here).
In the study of war crimes, crimes against humanity, and genocide, often the most important data is missing. To understand the magnitude of the crimes, and the patterns over time, geography, perpetrators, or victims, we need tools from mathematical statistics to understand what is not observed. Treating the observed data as though it is the only data emphasizes victims whose social visibility is high, while ignoring those who are often already among the most marginalized. As scholars who want to get the answer right, and as human rights activists focused on documenting violence against all people, the most important question we can ask is: what's missing? Who is hidden? And can we use scientific tools to figure it out? I think that when the data is just right and the methods are sound, we can, and we should.