Collecting user data is a competitive disadvantage

Warren Buffet is famous for identifying the need for businesses to have "moats" and "walls" around their profit-centers to keep competitors out, and data-centric companies often cite their massive collections of user-data as "moats" that benefit from "network effects" to make their businesses good investments. Read the rest

To do in NYC next Sat, May 11: "The Bigot in the Machine," a panel on algorithmic bias from PEN and McSweeney's

Next weekend, PEN America is throwing its World Voices Festival, including a McSweeney's-sponsored panel on algorithmic bias called The Bigot in the Machine, featuring poet/media activist Malkia Cyril, and Equality Labs founder Thenmozhi Soundararajan, moderated by investigative journalist Adrianne Jeffries: it's on May 11 at 2:30 at Cooper Union's Frederick P. Rose Auditorium. Tickets are $20. Read the rest

Exclusive: "More Data": Negativland's video short about data privacy and surveillance

[I've been in love with Negativland since their legendary copyright battle with U2 and they've been a part of Boing Boing since 2001; it's a pleasure beyond words to be able to debut More Data, their characteristically trenchant video about data privacy and surveillance; see below for notes from Negativland. -Cory] Read the rest

Open dataset of 1.78b links from the public web, 2016-2019

GDELT, a digital news monitoring service backed by Google Jigsaw, has released a massive, open set of linking data, containing 1.78 billion links in CSV, with four fields for each link: "FromSite,ToSite,NumDays,NumLinks." Read the rest

Big Data's "theory-free" analysis is a statistical malpractice

One of the premises of Big Data is that it can be "theory free": rather than starting with a hypothesis ("men at buffets eat more when women are present," "more people will click this button if I move it here," etc) and then gathering data to validate your guess, you just gather a ton of data and look for patterns in it. Read the rest

20,000 Dear Abby letters analyzed in study of "American" anxieties

"30 Years of American Anxieties" is a report on what 20,000 letters to Dear Abby reveal about the alarming things in life— and a great data presentation. Read the rest

In U.S. prisons, women are disciplined at a higher rate than men

Even women in prison can’t escape the sexist stereotype of the “difficult woman.” Read the rest

Nonprofit will coordinate 30 global investigative journalists to report leaked stories of big data abuse

The Signals Network is a nonprofit that supports independent investigative journalism; they're financially supporting a consortium of five international media groups Die Zeit (Germany), Mediapart (France), The Daily Telegraph (UK), The Intercept (US) and WikiTtribune (Global) as they investigate misuse of "big data." Read the rest

The Gates Foundation spent $775m on a Big Data education project that was worse than useless

Kudos to the Gates Foundation, seriously: after spending $775m on the Intensive Partnerships for Effective Teaching, a Big Data initiative to improve education for poor and disadvantaged students, they hired outside auditors to evaluate the program's effectiveness, and published that report, even though it shows that the approach did no good on balance and arguably caused real harms to teachers and students. Read the rest

The most interesting thing about the "Thanksgiving Effect" study is what it tells us about the limits of data anonymization

Late last year, a pair of economists released an interesting paper that used mobile location data to estimate the likelihood that political polarization had shortened family Thanksgiving dinners in 2016. Read the rest

Syllabus for a course on Data Science Ethics

The University of Utah's Suresh Venkatasubramanian and Katie Shelef are teaching a course in "Ethics in Data Science" and they've published a comprehensive syllabus for it; it's a fantastic set of readings for anyone interested in understanding and developing ethical frameworks for computer science generally, and data science in particular. Read the rest

Palantir has figured out how to make money by using algorithms to ascribe guilt to people, now they're looking for new customers

In 2009, JP Morgan Chase's "special ops" guy was an ex-Secret Service agent called Peter Cavicchia III, and he retained Palantir to spy on everyone in the company to find "insider threats"; even getting the bank to invest in Palantir. Read the rest

UC Berkeley offers its Foundations of Data Science course for free online

Berkeley's "Foundations of Data Science" boasts the fastest-growing enrollment of any course in UC Berkeley history, and now it's free on the university's Edx distance-education platform. Read the rest

Facebook insists that Cambridge Analytica didn't "breach" data, but "misused" it, and they're willing to sue anyone who says otherwise

Yesterday's bombshell article in the Guardian about the way that Cambridge Analytica was able to extract tens of millions of Facebook users' data without their consent was preceded by plenty of damage control on Facebook's part: they repeatedly threatened to sue news outlets if they reported on the story and fired the whistleblower who came forward with the story. Read the rest

Vendor lock-in, DRM, and crappy EULAs are turning America's independent farmers into tenant farmers

"Precision agriculture" is to farmers as Facebook is to publishers: farmers who want to compete can't afford to boycott the precision ag platforms fielded by the likes of John Deere, but once they're locked into the platforms' walled gardens, they are prisoners, and the platforms start to squeeze them for a bigger and bigger share of their profits. Read the rest

Hey, Wellington! I'm headed your way!

I've just finished a wonderful time at the Adelaide Festival and now I'm headed to the last stop on the Australia/New Zealand tour for Walkaway: Wellington! Read the rest

An incredibly important paper on whether data can ever be "anonymized" and how we should handle release of large data-sets

Even the most stringent privacy rules have massive loopholes: they all allow for free distribution of "de-identified" or "anonymized" data that is deemed to be harmless because it has been subjected to some process. Read the rest

More posts