Machine learning, deep-fat fryers, and community cultivation

Maciej Cegłowski's (previously) speech at the Library of Congress, "Deep-Fried Data," describes the way that data begs to be analyzed and how machine learning is like a deep-fat fryer — a fryer makes anything you put in it "kind of" delicious, and machine learning "kind of" finds insights in your data-set.

But unless you know what your food is being fried in, you have no idea what's actually happening to it. And unless you know what data is used to train your machine-learning system, you can't know if it's finding real insight or just serving as a "money-laundry for bias."

Cegłowski does a great job on the structural limits and seductive appeal of machine learning, but then moves on to how archivists and librarian can use large data-sets, and the weird problems of providing data to strangers who use it in ways you may find disturbing or frivolous, or just inexplicable. From here, Cegłowski talks about where data-sets to analyze can come from — whether you can work with companies addicted to the surveillance business-model and keep your ethics intact — and what good archiving practice should be in an era of dynamic documents served to rapidly obsoleted technologies.

In cultivating communities, I prefer gardening metaphors. You need the right conditions, a propitious climate, fertile soil, and a sprinkling of bullshit. But you also need patience, weeding, and tending. And while you're free to plant seeds, what you wind up with might not be what you expected.

If we take seriously the idea that digitizing collections makes them far more accessible, then we have to accept that the kinds of people and activities those collections will attract may seem odd to us. We have to give up some control.

This should make perfect sense. Human cultures are diverse. It's normal that there should be different kinds of food, music, dance, and we enjoy these differences. Unless you're a Silicon Valley mathlete, you delight in the fact that there are hundreds of different kinds of cuisine, rather than a single beige beverage that gives you all your nutrition.

But online, our horizons narrow. We expect domain experts and programmers to be able to meet everyone's needs, sight unseen. We think it's normal to build a social network for seven billion people.

Deep-Fried Data [Maciej Cegłowski/Idlewords]

(via 4 Short Links)

(Image: Pizza in deep fat fryer, Edward Betts, CC-BY-SA)