Ben Lorica, O'Reilly's chief data scientist, has posted slides and notes from his talk at last December's Strata Data Conference in Singapore, "We need to build machine learning tools to augment machine learning engineers."
Lorica describes a new job emerging in IT departments: "machine learning engineers," whose job is to adapt machine learning models for production environments. These new engineers run the risk of embedding algorithmic bias into their systems, which unfairly discriminate, create liability, and reduces the quality of the recommendations the systems produce.
He presents a set of technical and procedural steps to take to minimize these risks, with links to the relevant papers and code. It's really required reading for anyone implementing a machine learning system in a production environment.
Another example has to do with error: once we are satisfied with a certain error rate, aren’t we done and ready to deploy our model to production? Consider a scenario where you have a machine learning model used in health care: in the course of model building, your training data for millenials (in red) is quite large compared to the number of labeled examples from senior citizens (in blue). Since accuracy tends to be correlated with the size of your training set, chances are the error rate for senior citizens will be higher than for millenials.
For situations like this, a group of researchers introduced a concept, called "equal opportunity", that can help alleviate disproportionate error rates and ensure the “true positive rate” for the two groups are similar. See their paper and accompanying interactive visualization.
We need to build machine learning tools to augment machine learning engineers [Ben Lorica/O'Reilly]
(via 4 Short Links)
A presentation today at Defcon from Drexel computer science prof Rachel Greenstadt and GWU computer sicence prof Aylin Caliskan builds on the pair's earlier work in identifying the authors of software and shows that they can, with a high degree of accuracy, identify the anonymous author of software, whether in source-code or binary form.
Research into the shittiness of voice assistants zeroed in on a problem that many people were all-too-aware of: the inability of these devices to recognize "accented" speech ("accented" in quotes because there is no one formally correct English, and the most widely spoken English variants, such as Indian English, fall into this "accented" category).
The Wildbook project conducts wild animal population censuses by combining photos of animals taken by tourists, scientists, and volunteers and then using their distinctive features (zebra stripes, whale fluke shapes, leopard spots, etc) to identify individuals and produces unprecedented data that uses creepy facial recognition tools for non-creepy purposes.
Traveling isn’t always the most comfortable experience, but at least you have your music to keep you company on those long flights. That is, until your chatty neighbor and that crying baby three seats over drown out your playlist. These Paww WaveSound 3 Noise-Cancelling Bluetooth Headphones block up to 20 decibels of audio, so you can […]
SEO can be a fickle creature, but it can work in your favor—you just need the right tools. When it comes to getting your site on that coveted first page of Google, SERPstash Premium simplifies the process with 21 user-friendly tools designed to break down your page’s performance and show you where you can improve. Lifetime […]
Running a Shopify store is a great way to net some extra cash on the side or—if you really know what you’re doing—replace your 9-to-5 altogether. However, success doesn’t come naturally, and newcomers tend to receive mixed results when starting on their own. This E-Commerce Bootcamp can help start your Shopify venture off on the right […]