Behold! The Library of Congress's audacious plan to digitize and share the nation's treasures

The Library of Congress has published its latest digital strategy, laying out a bold plan to "exponentially grow" its collections through digital acquisitions; "maximize the use of content" by providing machine-readable rights data and using interoperable formats and better search; to support data-driven research with giant bulk-downloadable corpuses of materials and metadata; to improve its website; to syndicate Library assets to other websites; to crowdsource the acquisition of new materials; to experiment with new tools and techniques; and to preserve digital assets with the same assiduousness that the Library has shown with its physical collection for centuries. Read the rest

Comprehensive, open tutorial on using data analysis in social science research

Benjamin Mako Hill (previously) collaborated with colleagues involved in critical technology studies to write a textbook chapter analyzing the use of computational methods in social science and providing advice for social scientists who want to delve into data-based social science. Read the rest

Snakes and Ladders can be analyzed by converting it to a Markov Chain

University of Washington data scientist Jake Vanderplas found himself trapped in an interminable series of Snakes and Ladders (AKA Chutes and Ladders) with his four-year-old and found himself thinking of how he could write a Python program to simulate and solve the game. Read the rest

What is bias in machine learning, when does it matter, and what can we do about it?

"The Trouble with Bias," Kate Crawford's (previously) keynote at the 2017 Neural Information Processing Systems is a brilliant tour through different ways of thinking about what bias is, and when we should worry about it, specifically in the context of machine learning systems and algorithmic decision making -- the best part is at the end, where she describes what we should do about this stuff, and where to get started. (via 4 Short Links) Read the rest

How to replace yourself with a very small shell script

Data scientist Hillary Mason (previously) talks through her astoundingly useful collection of small shell scripts that automate all the choresome parts of her daily communications: processes that remind people when they owe her an email; that remind her when she accidentally drops her end of an exchange; that alert her when a likely important email arrives (freeing her up from having to check and check her email to make sure that nothing urgent is going on). It's a hilarious and enlightening talk that offers a glimpse into the kinds of functionality that users can provide for themselves when they run their own infrastructure and aren't at the mercy of giant webmail companies. (via Clive Thompson) Read the rest

Ethics and AI: all models are wrong, some are useful, and some of those are good

The old stats adage goes: "All models are wrong, but some models are useful." In this 35 minute presentation from the O"Reilly Open Data Science Conference, data ethicist Abe Gong from Aspire Health provides a nuanced, meaningful, accessible and eminently actionable overview of the ways that ethical considerations can be incorporated into the design of powerful algorithms. Read the rest

A checklist for figuring out whether your algorithm is a "weapon of math destruction"

The Data & Society institute (dedicated to critical, interdisciplinary perspectives on big data) held an online seminar devoted to Cathy O'Neil's groundbreaking book Weapons of Math Destruction, which showed how badly designed algorithmic decision-making systems can create, magnify and entrench the social problems they're supposed to solve, perpetuating inequality, destabilizing the economy, and making a small number of people very, very rich. Read the rest

Let's teach programming as a tool for analyzing data to transform the world

Data-scientist Kevin H Wilson argues that computers are tools for manipulating data -- from companies' sales data to the input from games controllers -- but we teach computer programming as either a way to make cool stuff (like games) or as a gateway to "rigorous implementation details of complicated language," while we should be focusing on fusing computer and math curriciula to produce a new generation of people who understand how to use computers to plumb numbers to find deep, nuanced truths we can act upon. Read the rest

Big Data Ethics: racially biased training data versus machine learning

Writing in Slate, Cathy "Weapons of Math Destruction" O'Neill, a skeptical data-scientist, describes the ways that Big Data intersects with ethical considerations. Read the rest

Weapons of Math Destruction: how Big Data threatens democracy

I was hugely impressed with Cathy "Mathbabe" O'Neil's talk at Personal Democracy Forum 2015, "Weapons of Math Destruction," in which she laid out the way that the "opaque, unregulated, and uncontestable" conclusions of Big Data threaten fairness and democracy. Read the rest