Ben Lorica, O'Reilly's chief data scientist, has posted slides and notes from his talk at last December's Strata Data Conference in Singapore, "We need to build machine learning tools to augment machine learning engineers."
Lorica describes a new job emerging in IT departments: "machine learning engineers," whose job is to adapt machine learning models for production environments. These new engineers run the risk of embedding algorithmic bias into their systems, which unfairly discriminate, create liability, and reduces the quality of the recommendations the systems produce.
He presents a set of technical and procedural steps to take to minimize these risks, with links to the relevant papers and code. It's really required reading for anyone implementing a machine learning system in a production environment.
Another example has to do with error: once we are satisfied with a certain error rate, aren’t we done and ready to deploy our model to production? Consider a scenario where you have a machine learning model used in health care: in the course of model building, your training data for millenials (in red) is quite large compared to the number of labeled examples from senior citizens (in blue). Since accuracy tends to be correlated with the size of your training set, chances are the error rate for senior citizens will be higher than for millenials.
For situations like this, a group of researchers introduced a concept, called "equal opportunity", that can help alleviate disproportionate error rates and ensure the “true positive rate” for the two groups are similar. See their paper and accompanying interactive visualization.
We need to build machine learning tools to augment machine learning engineers [Ben Lorica/O'Reilly]
(via 4 Short Links)
In 2014, Quentin Tarantino sued Gawker for publishing a link to a leaked pre-release screener of his movie "The Hateful Eight." The ensuing court-case revealed that the screeners Tarantino's company had released had some forensic "traitor tracing" features to enable them to track down the identities of people who leaked copies.
Rebecca Tickle is a PhD student in the Faculty of Science at the University of Nottingham, explains what big data is. She uses a handy “Five Vees of Big Data Mnemonic” — volume, velocity, variety, value, and veracity. She mentions that other people have come up with the “Seven Vees of Big Data Mnemonic” and […]
An adversarial preturbation is a small, human-imperceptible change to a piece of data that flummoxes an otherwise well-behaved machine learning classifier: for example, there's a really accurate ML model that guesses which full-sized image corresponds to a small thumbnail, but if you change just one pixel in the thumbnail, the classifier stops working almost entirely.
Kudos to those of us who have chosen a less wasteful third option to “paper or plastic” at the supermarket or club stores. Tote bags are reusable, but they can be a pain to tote around. Here’s an upgrade to that planet-saving measure. The Club Cart Lotus Trolley Bag is that rare tote you’ll want […]
Looking for a career in IT, gaming or software development? In the ever-changing world of the internet, versatility is your biggest asset. In other words, mastering Java might not cut it in an interview if you don’t know C#. However, there’s a bundle that covers the essentials in most any language. The Legendary Learn to […]
Getting a set of cookware that will outlast you is one of those signs you’ve truly grown up. It used to be easy to find durable materials that also cook well, but these days it can be hard to tell what’s quality and what brands are coasting by on a recognizable name. Well, there’s at […]