Late last year, a pair of economists released an interesting paper that used mobile location data to estimate the likelihood that political polarization had shortened family Thanksgiving dinners in 2016.
The conclusions were indeed interesting, but far more telling is the methodology. The researchers were able to buy location data from a marketing broker (the same kind of shadowy figure that sells ER patients' identities to ambulance chasing lawyers, and also sometimes continuously leaks all location data for everyone in the USA and Canada to anyone in the world, for years), and by tracking how long people stayed at dinner on Thanksgiving, they were able to calculate the duration of the meal.
Then the researchers used the same brokerages to get the location of the precinct where their subjects had voted, and they used that to infer the subjects' political alignment (precinct-level voting is a matter of public record and tends to be very homogenous).
It's not hard to imagine how the re-identification process could have gone farther — for example, you could look up the owner of the house where the diners ate, then look for people with the same surname living at the addresses they went home to.
The tech industry is in the midst of a largely invisible re-identification crisis: much of the promise of machine learning and other Big Data applications rests on the idea that potentially compromising data can be rendered safe for use and sharing through "de-identification" (the GDPR has a huge loophole that absolves companies of most of their responsibilities if they "de-identify" data before sharing it!), and the existence of reliable de-identification is taken as an article of faith within industry and regulators, even though computer scientists are incredibly skeptical that it can be effective or even possible.
The real interesting thing about the Thanksgiving Effect is how trivial it is to identify the political alignment, familial relations, and other personal (and confidential) information about people from supposedly harmless marketing data.
You might be thinking: OK, that's one kind of data, but it's creepy to think that researchers could match my phone's location with my political affiliation. Who the hell has that data? That's where the election results come in. Chen and Rohla collected precinct-level polling data—the highest-resolution available—through internet scraping and by contacting secretaries of state, boards of election, and county clerks. Merging the two datasets was as simple as inferring the precinct and census block of each smartphone user's home based on the location of pings logged between 1:00 am and 4:00 am in the weeks before Thanksgiving. People who live in the same precinct tend to vote for the same candidate. "So it turns out where your smartphone spends its time between 1:00 and 4:00 in the morning correlates pretty closely with who you voted for in the 2016 election," says Chen. He and Rohla used the same method to determine where people traveled on Thanksgiving day.
With their merged data sets, the researchers were able to control for geographic and demographic factors. By comparing similar families to each other, Chen and Rohla showed that dinners attended by residents from opposing-party precincts in 2016 were 30 to 50 minutes shorter than same-party get-togethers. They were also able to compare behavior between years: In 2015, mismatched families still tended to spend less time together than matched ones, but the effect was more pronounced in 2016. And that was especially true when travelers and hosts hailed from media markets with lots of political television ads.
In short: It's astonishing what you can infer about personal relationships from just a few datasets.
The 'Thanksgiving Effect' and the Creepy Power of Phone Data [Robbie Gonzales/Wired]
Steven Depolo, CC-BY)