AT&T's guilt-by-association algorithm for finding "terrorists"

Andrew Appel has some fascinating analysis of the "guilt by association" algorithm that AT&T uses to help the FBI figure out whose life to ruin with baseless accusations of terrorist involvement. Other phone companies like Verizon refused to help out with these fishing expeditions, but AT&T jumped right in. The thing is, after three hops, your social network encompasses half the planet, including many of its terrorists (or even "terrorists").

What is the "communities of interest" technology? It's spelled out very clearly in a 2001 research paper from AT&T itself, entitled "Communities of Interest" (by C. Cortes, D. Pregibon, and C. Volinsky). They use high-tech data-mining algorithms to scan through the huge daily logs of every call made on the AT&T network; then they use sophisticated algorithms to analyze the connections between phone numbers: who is talking to whom? The paper literally uses the term "Guilt by Association" to describe what they're looking for: what phone numbers are in contact with other numbers that are in contact with the bad guys?


Update: Wired/Threat Level's Ryan Singel sez,

Following up on Freedom to Tinker's post on AT&T's calling circle research adopted by the FBI, it seems that data mining program was made possible through AT&T's development of a mass surveillance programming language called Hancock.

A variant of C, Hancock is used to process millions of records in streams as they get dumped into various databases. Uses include creating maps of cell phone users locations and tracking IP and websites addresses.

AT&T even snagged patents on some of the data mining methods, which may seem eerily familiar to the phone record data mining the NSA used post-9/11 to find targets for their warrantless targeting of American citizens.