Ben Lorica, O'Reilly's chief data scientist, has posted slides and notes from his talk at last December's Strata Data Conference in Singapore, "We need to build machine learning tools to augment machine learning engineers."
Lorica describes a new job emerging in IT departments: "machine learning engineers," whose job is to adapt machine learning models for production environments. These new engineers run the risk of embedding algorithmic bias into their systems, which unfairly discriminate, create liability, and reduces the quality of the recommendations the systems produce.
He presents a set of technical and procedural steps to take to minimize these risks, with links to the relevant papers and code. It's really required reading for anyone implementing a machine learning system in a production environment.
Another example has to do with error: once we are satisfied with a certain error rate, aren’t we done and ready to deploy our model to production? Consider a scenario where you have a machine learning model used in health care: in the course of model building, your training data for millenials (in red) is quite large compared to the number of labeled examples from senior citizens (in blue). Since accuracy tends to be correlated with the size of your training set, chances are the error rate for senior citizens will be higher than for millenials.
For situations like this, a group of researchers introduced a concept, called "equal opportunity", that can help alleviate disproportionate error rates and ensure the “true positive rate” for the two groups are similar. See their paper and accompanying interactive visualization.
We need to build machine learning tools to augment machine learning engineers [Ben Lorica/O'Reilly]
(via 4 Short Links)
"Idea-instructions" bills itself as "An ongoing series of nonverbal algorithm assembly instructions", with a half-dozen illustrations of popular computer science concepts covered to date; the latest covers Public-Key Crypto, one of the most important and elusive concepts from modern crypto.
The creation of "public ledgers" -- like blockchain, popularized by Bitcoin -- requires "consensus algorithms" that allow mutually untrusted, uncoordinated parties to agree on a world-readable, distributed list of things (domain names, transactions, title deeds, etc), something that cryptography makes possible in a variety of ways.
Anil Dash's third law holds that "Three things never work: Voice chat, printers and projectors." But Joshua Rothman's long, fascinating, even poetic profile of the Xerox engineers who work on paper-path process improvements is such a bit of hard-science whimsy that it almost makes me forgive every hour I've spent swearing over jammed paper.
Surfing on public Wi-Fi is convenient, but it’s far from safe. Whether you’re at a cafe or hotel, connecting to an unsecured network exposes you—and your personal information—to a host of hazards, including hackers, government spies, and trackers. Private Internet Access helps you navigate past these risks and tap into a safer, restriction-free internet, and […]
The web is vast, and while there’s room for everyone, competition is stiff when it comes to landing on that first page of a Google search. That’s why developers aren’t afraid to spend exorbitant amounts of time and money on search engine optimization (SEO) to ensure their sites rank higher than others. However, not all […]
Many of us enjoy the aesthetic of vintage electronics, but trying to use most hardware from the 1950’s isn’t necessarily practical. This is especially true where speakers are concerned. While most of us can appreciate the old-school feel of retro speakers, they have a hard time matching the convenience and power delivered by today’s Bluetooth speakers. […]