Ben Lorica, O'Reilly's chief data scientist, has posted slides and notes from his talk at last December's Strata Data Conference in Singapore, "We need to build machine learning tools to augment machine learning engineers."
Lorica describes a new job emerging in IT departments: "machine learning engineers," whose job is to adapt machine learning models for production environments. These new engineers run the risk of embedding algorithmic bias into their systems, which unfairly discriminate, create liability, and reduces the quality of the recommendations the systems produce.
He presents a set of technical and procedural steps to take to minimize these risks, with links to the relevant papers and code. It's really required reading for anyone implementing a machine learning system in a production environment.
Another example has to do with error: once we are satisfied with a certain error rate, aren’t we done and ready to deploy our model to production? Consider a scenario where you have a machine learning model used in health care: in the course of model building, your training data for millenials (in red) is quite large compared to the number of labeled examples from senior citizens (in blue). Since accuracy tends to be correlated with the size of your training set, chances are the error rate for senior citizens will be higher than for millenials.
For situations like this, a group of researchers introduced a concept, called "equal opportunity", that can help alleviate disproportionate error rates and ensure the “true positive rate” for the two groups are similar. See their paper and accompanying interactive visualization.
We need to build machine learning tools to augment machine learning engineers [Ben Lorica/O'Reilly]
(via 4 Short Links)
I'm on record as being a big supporter of learning regular expressions (AKA "regexp") -- handy ways to search through text with very complex criteria. It's notoriously opaque to beginners, but it's such a massively effective automation tool and drudgery reliever! Regex Crosswords help you hone your regexp skills with fiendishly clever regular expressions that […]
[I ran a review of this in June when the UK edition came out -- this review coincides with the US edition's publication] Rob Smith is an eminent computer scientist and machine learning pioneer whose work on genetic algorithms has been influential in both industry and the academy; now, in his first book for a […]
More bad news for Google's beleaguered spinoff Jigsaw, whose flagship project is "Perspective," a machine-learning system designed to catch and interdict harassment, hate-speech and other undesirable online speech.
Studies have shown cannabidiol (more popularly known as CBD) to be effective in two main areas: Pain relief and stress relief. Both of those make the non-psychoactive, cannabis-derived compound a natural for topical creams. There’s no shortage of CBD products out there, but here’s eight of our favorites, all specifically designed for dermatological use – […]
If you’re part of the maker community, you know Make:. Though Make: magazine is off the shelves as of this year, the eBooks and resources put out by Maker Media are still a fantastic resource for the new generation of tinkerers, hackers, and robotics geeks. If you’re in that tribe, listen up: they’ve released a […]
Life isn’t getting any less hectic, and pressure cookers are a quick, healthy solution for a growing number of kitchens. But if you thought your Instant Pot was versatile, there’s a major upgrade on the market: The Yedi 9-in-1 Total Package Instant Programmable Pressure Cooker. If you’ve somehow never used a pressure cooker before, try […]