Machine learning models keep getting spoofed by adversarial attacks and it's not clear if this can ever be fixed

Machine learning models use statistical analysis of historical data to predict future events: whether you are a good candidate for a loan, whether you will violate parole, or whether the thing in the road ahead is a stop sign or a moose.


But adversarial examples keep cropping up: seemingly trivial changes to objects (even a single pixel) can stump an AI or send a self-driving car barelling through a stop sign.

In some ways, this is a continuation of an old security principle, the useless of fighting the last war. When Sergey Brin and Larry Page invented Pagerank, they observed that the only reason to make a link between two web-pages was because the author of page 1 thought there was something noteworthy on page 2, and by counting these links, they could build a table of the most noteworthy pages on the internet and improve search. But it wasn't hard to make links and as soon as Google gave people a reason to make them, they did, and Pagerank entered into an arms-race with the web to find signals of page quality that hadn't been gamed — as soon as they did, people started to game it.

No stop sign made until now has been designed to fool an AI, because AIs weren't looking at stop signs. All the training data on stop signs was created by people who weren't adverse to the researchers. But the fact that an AI can recognize a stop sign that wasn't designed to fool it tells you nothing about whether an AI could recognize a sign that *was*.

But in machine learning land, things are even weirder, since much of the work of the models generated by ML techniques is opaque to the creators and users of the model. We don't understand how the code is deciding that this is a stop sign, so we don't understand which attacks it might be vulnerable to.


A friend who is a prominent security researcher and a security lead at one of the world's largest ML companies told me that they believe that there may be easily discovered adversarial examples for all ML models, which has serious implications for their real-world use.


To build stronger defenses against such attacks, machine learning researchers may need to get meaner. Athalye and Biggio say the field should adopt practices from security research, which they say has a more rigorous tradition of testing new defensive techniques. "People tend to trust each other in machine learning," says Biggio. "The security mindset is exactly the opposite, you have to be always suspicious that something bad may happen."

A major report from AI and national security researchers last month made similar recommendations. It advised those working on machine learning to think more about how the technology they are creating could be misused or exploited.

Protecting against adversarial attacks will probably be easier for some AI systems than others. Biggio says that learning systems trained to detect malware should be easier to make more robust, for example, because malware must be functional, limiting how varied it can be. Protecting computer-vision systems is much more difficult, Biggio says, because the natural world is so varied, and images contain so many pixels.

AI Has a Hallucination Problem That's Proving Tough to Fix [Tom Simonite/Wired]

(Image: Veloshots)