Wireheading: when machine learning systems jolt their reward centers by cheating

Machine learning systems are notorious for cheating, and there's a whole menagerie of ways that these systems achieve their notional goals while subverting their own purpose, with names like "model stealing, rewarding hacking and poisoning attacks." Read the rest

Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes

A team of researchers from Microsoft and Harvard's Berkman Center have published a taxonomy of "Failure Modes in Machine Learning," broken down into "Intentionally-Motivated Failures" and "Unintended Failures." Read the rest