Ben Lorica, O'Reilly's chief data scientist, has posted slides and notes from his talk at last December's Strata Data Conference in Singapore, "We need to build machine learning tools to augment machine learning engineers."
Lorica describes a new job emerging in IT departments: "machine learning engineers," whose job is to adapt machine learning models for production environments. These new engineers run the risk of embedding algorithmic bias into their systems, which unfairly discriminate, create liability, and reduces the quality of the recommendations the systems produce.
He presents a set of technical and procedural steps to take to minimize these risks, with links to the relevant papers and code. It's really required reading for anyone implementing a machine learning system in a production environment.
Another example has to do with error: once we are satisfied with a certain error rate, aren’t we done and ready to deploy our model to production? Consider a scenario where you have a machine learning model used in health care: in the course of model building, your training data for millenials (in red) is quite large compared to the number of labeled examples from senior citizens (in blue). Since accuracy tends to be correlated with the size of your training set, chances are the error rate for senior citizens will be higher than for millenials.
For situations like this, a group of researchers introduced a concept, called "equal opportunity", that can help alleviate disproportionate error rates and ensure the “true positive rate” for the two groups are similar. See their paper and accompanying interactive visualization.
We need to build machine learning tools to augment machine learning engineers [Ben Lorica/O'Reilly]
(via 4 Short Links)
Repairnator is a bot that identifies bugs in open source software integration and creates patches without human intervention, submitting them to the open source project's maintainers under an assumed human identity; it has succeeded in having five of its patches accepted so far.
Robin Sloan is a programmer and novelist whose books like Sourdough and Mr Penumbra's 24-Hour Bookstore are rich and evocative blends of self-aware nerdy playfulness and magical speculation.
Pete Warden (previously) writes persuasively that machine learning companies could make a ton of money by turning to data-compression: for example, ML systems could convert your speech to text, then back into speech using a high-fidelity facsimile of your voice at the other end, saving enormous amounts of bandwidth in between.
Speed reading isn’t just an innate skill possessed by a lucky few. Anyone can learn to speed read, and the benefits are endless. The brain can process more information than most people have time to soak up, but you can make that time now with the 2018 Award-Winning Speed Reading Bundle. The first half of […]
Sure, you could use the same old PowerPoint templates for your next business presentation. It’s not like you have bosses or investors to impress. Oh wait, you do? Time to augment that slideshow with Slideshop – the presentation tool that can individualize your pitch while saving you time. Compatible with PowerPoint, Keynote and Google Slides, […]
Multinational companies have used the no-nonsense methodologies of Six Sigma and Lean Six Sigma to oil a smooth-running operation for years. What is it? Six Sigma (and its offshoot, Lean Six Sigma) apply the principles of science to business, teaching managers to methodically target waste, maximize output and streamline the flow from producer to consumer. […]