The Data & Society institute (dedicated to critical, interdisciplinary perspectives on big data) held an online seminar devoted to Cathy O'Neil's groundbreaking book Weapons of Math Destruction, which showed how badly designed algorithmic decision-making systems can create, magnify and entrench the social problems they're supposed to solve, perpetuating inequality, destabilizing the economy, and making a small number of people very, very rich.
My takeaway from the book was that O'Neil had done excellent work in providing rules of thumb to distinguish good machine-learning models from bad ones — primarily that models have to have unbiased training data, and that they have to be continuously refined by comparing their predictions against reality — but many of the responses from the practitioners at D&S convey palpable hurt feelings that O'Neil didn't go far enough to stress the benefits of well-executed computer modeling.
One response bridged the gap between these two perspectives in a most excellent fashion: Mark Ackerman's "Safety Checklists for Sociotechnical Design" proposes a checklist that data practitioners can use when designing systems to figure out if they're fashioning a tool or a weapon.
One might imagine multiple checklists with different kinds of things to watch for. Software and data engineers might profitably look at a checklist with probability and statistical issues, the sins of science:
1. You need to check that you have not trained on one population and then run the system on a different population, otherwise you may get enormous errors. Etc.
2. You need to understand that composite or complex variables are operationalized through many indicator variables. That is, if one needs to look for something like trustworthiness, there will be many potential aspects of trustworthiness. It is extremely unlikely that any indicator perfectly substitutes with a composite variable; this is a cause of error. More insidious is when an indicator variable aligns strongly most of the time, but for some situations or for some subpopulations, it aligns badly.
Other problems might be considered sins of opacity. Those who will use these systems — managers of public agencies, for example — might look at a different, but overlapping, checklist. For example:
3. Systems that are not available for examination cannot be trusted to operate without error. Either the software code must be available for inspection or test data must be available. Otherwise, there is no way to know whether the sins of science are present.
4. Systems that operate over time must have feedback loops that look for error or they must be examined over time. Code that incorporates or represents social activity tends to drift off and become less and less accurate over time — unless it incorporates inspectable feedback mechanisms.
Safety Checklists for Sociotechnical Design
[Mark Ackerman/Data and Society]
(Image: The checklist for the composting , SuSanA Secretariat, CC-BY)