Thousands of Americans got sub-broadband ISP service, thanks to telcoms shenanigans


Measurement Lab, an open, independent analysis organization devoted to measuring the quality of Internet connections and detecting censorship, technical faults and network neutrality violations, has released a major new report on how ISPs connect to one another, and it's not pretty.

Read the rest

Wouldn't it be great if a billboard could actually read your mind?

Said no one, ever. Except, apparently not: the "data scientists" of Posterscope are excited that EE -- a joint venture of T-Mobile and Orange -- will spy on all their users' mobile data to "give profound insights...that were never possible before"

Read the rest

Mercilessly pricking the bubbles of AI, Big Data, machine learning


Michael I Jordan is an extremely accomplished computer scientist who is also deeply skeptical of claims made by Big Data advocates as well as people who believe that machine intelligence, AI and machine vision are solved, or nearly so.

Read the rest

Ontario police's Big Data assigns secret guilt to people looking for jobs, crossing borders


There are no effective legal limits on when and to whom police can disclose unproven charges against you, 911 calls involving mental health incidents, and similar sensitive and prejudicial information; people have been denied employment, been turned back at the US border and suffered many other harms because Ontario cops send this stuff far and wide.

Read the rest

Microsoft says it won't use contents of emails to target ads

Alan sez, "Microsoft is pushing out an update to its privacy policies."

Read the rest

Big Data should not be a faith-based initiative

Cory Doctorow summarizes the problem with the idea that sensitive personal information can be removed responsibly from big data: computer scientists are pretty sure that’s impossible.

Read the rest

IRS won't fix database of nonprofits, so it goes dark


Rogue archivist Carl Malamud writes, "Due to inaction by the Internal Revenue Service and the U.S. Congress, Public.Resource.Org has been forced to terminate access to 7,634,050 filings of nonprofit organizations. The problem is that we have been fixing the database, providing better access mechanisms and finding and redacting huge numbers of Social Security Numbers. Our peers such as GuideStar are also fixing their copies of the database."

Read the rest

Inherent biases warp Big Data


The theory of Big Data is that the numbers have an objective property that makes their revealed truth especially valuable; but as Kate Crawford points out, Big Data has inherent, lurking bias, because the datasets are the creation of fallible, biased humans. For example, the data-points on how people reacted to Hurricane Sandy mostly emanate from Manhattan, because that's where the highest concentration of people wealthy enough to own tweeting, data-emanating smartphones are. But more severely affected locations -- Breezy Point, Coney Island and Rockaway -- produced almost no data because they had fewer smartphones per capita, and the ones they had didn't work because their power and cellular networks failed first.

I wrote about this in 2012, when Google switched strategies for describing the way it arrived at its search-ranking. Prior to that, the company had described its ranking process as a mathematical one and told people who didn't like how they got ranked that the problem was their own, because the numbers didn't lie. After governments took this argument to heart and started ordering Google to change its search results -- on the grounds that there's no free speech question if you're just ordering post-processing on the outcome of an equation -- Google started commissioning law review articles explaining that the algorithms that determined search-rank were the outcome of an expressive, human, editorial process that deserved free speech protection.

Read the rest

Anti-Net Neutrality Congresscritters made serious bank from the cable companies


The Congressmen who sent letters to the FCC condemning Net Neutrality received 2.3 times more campaign contributions from the cable industry than average. The analysis, conducted with Maplight's Congressional transparency tools, shows that Dems are cheaper to bribe than Republicans (GOP members received 5x the Congressional average from Big Cable; Dems only 1.2x) and shows what a chairmanship of a powerful committee is worth: Rep. Greg Walden (R-Ore.), who chairs the FCC-overseeing Subcommittee on Communications and Technology, got $109,250 (the average congressscritter got $11,651).

29 Congresscritters own stock in Comcast, and Comcast is the 25th most-held stock in Congress.

Read the rest

EFF on the White House's Big Data report: what about privacy and surveillance?

Last week, I wrote about danah boyd's analysis of the White House's Big Data report [PDF]. Now, the Electronic Frontier Foundation has added its analysis to the discussion. EFF finds much to like about the report, but raises two very important points:

* The report assumes that you won't be able to opt out of leaving behind personal information and implicitly dismisses the value of privacy tools like ad blockers, Do Not Track, Tor, etc

* The report is strangely silent on the relationship between Big Data and mass surveillance, except to the extent that it equates whistleblowers like Chelsea Manning and Edward Snowden with the Fort Hood shooter, lumping them all in as "internal threats"

Read the rest

Big Data analysis from the White House: understanding the debate


Danah boyd, founder of the critical Big Data think/do tank Data and Society, writes about the work she did with the White House on Big Data: Seizing Opportunities, Preserving Values [PDF]. Boyd and her team convened a conference called The Social, Cultural & Ethical Dimensions of "Big Data" (read the proceedings here), and fed the conclusions from that event back to the White House for its report.

In boyd's view, the White House team did good work in teasing out the hard questions about public benefit and personal costs of Big Data initiatives, and made solid recommendations for future privacy-oriented protections. Boyd points to this Alistair Croll quote as getting at the heart of one of Big Data's least-understood problems:

Perhaps the biggest threat that a data-driven world presents is an ethical one. Our social safety net is woven on uncertainty. We have welfare, insurance, and other institutions precisely because we can’t tell what’s going to happen — so we amortize that risk across shared resources. The better we are at predicting the future, the less we’ll be willing to share our fates with others.

Read the rest

Can you really opt out of Big Data?


Janet Vertesi, assistant professor of sociology at Princeton University, had heard many people apologize for commercial online surveillance by saying that people who didn't want to give their data away should just not give their data away -- they should opt out. So when she got pregnant, she and her husband decided to keep the fact secret from marketing companies (but not their friends and family). She quickly discovered that this was nearly impossible, even while she used Tor, ad blockers, and cash-purchased Amazon cards that paid for baby-stuff shipped to anonymous PO boxes.

Read the rest

Hipsterbait1: algorithmically generated post-ironic tees


Shardcore writes, "I've built a new bot to troll/delight hipsters. It algorithmically creates post-post-ironic t-shirt designs, posts them on twitter and tumblr and offers them for sale. No human is involved in the process at all."

Read the rest

Big Data has big problems


Writing in the Financial Times, Tim Harford (The Undercover Economist Strikes Back, Adapt, etc) offers a nuanced, but ultimately damning critique of Big Data and its promises. Harford's point is that Big Data's premise is that sampling bias can be overcome by simply sampling everything, but the actual data-sets that make up Big Data are anything but comprehensive, and are even more prone to the statistical errors that haunt regular analytic science.

What's more, much of Big Data is "theory free" -- the correlation is observable and repeatable, so it is assumed to be real, even if you don't know why it exists -- but theory-free conclusions are brittle: "If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down." Harford builds on recent critiques of Google Flu (the poster child for Big Data) and goes further. This is your must-read for today.

Read the rest

Big Data Hubris: Google Flu versus reality

In The Parable of Google Flu: Traps in Big Data Analysis [PDF], published in Science, researchers try to understand why Google Flu (which uses search history to predict flu outbreaks) performed so well at first but has not done well since. One culprit: people don't know what the flu is, so their search for "flu" doesn't necessarily mean they have flu. More telling, though, is that Google can't let outsiders see their data or replicate their findings, meaning that they can't get the critical review that might help them spot problems before years of failure. (via Hacker News) Cory 2