Adversarial patches: colorful circles that convince machine-learning vision system to ignore everything else

Machine learning systems trained for object recognition deploy a bunch of evolved shortcuts to choose which parts of an image are important to their classifiers and which ones can be safely ignored.


A group of Google researchers just published Adversarial Patch, a paper that systematizes the creation of "patches" — brightly colored circles designed to distract these AI image classifiers, ignoring everything else and paying attention primarily to them.


The implication is that you could just stick a few of these in a scene (or onto your garments) and cause AI image classifiers to ignore whatever is going on and instead become fascinated with your weird, psychedelic imagery.


The researchers characterize their work as "robust" ("because they work under a wide variety of transformations").

There's a lot of work on adversarial preturbations that fool image classifiers: tricking Google's AI into thinking rifles are helicopters, that turtles are rifles, turning stop signs into go-faster signs, or making AIs see faces everywhere.


But this work focuses on techniques that are undetectable to humans: fooling classifiers by altering a single pixel or adding static to them that causes them to be unidentifiable.


The new technique does away with the requirement for imperceptibility and shows that if you don't care about whether humans can see that something weird is going on, you can really mess with AIs.


Another key difference here: the researchers achieve their best results using a "white box" technique where they get to design their patches through detailed knowledge of the AI they're targeting, unlike other adversarial preturbations, which achieved good results with "black box" constraints (designing attacks without any technical knowledge of the AI). The patches they created didn't work very well on other AI image classifiers.


We show that we can generate a universal, robust, targeted patch that fools classifiers regardless of
the scale or location of the patch, and does not require knowledge of the other items in the scene
that it is attacking. Our attack works in the real world, and can be disguised as an innocuous sticker.
These results demonstrate an attack that could be created offline, and then broadly shared.
There has been substantial work on defending against small perturbations to natural images, at least
partially motivated by security concerns Part of the motivation of this work is that potential
malicious attackers may not be concerned with generating small or imperceptible perturbations to a
natural image, but may instead opt for larger more effective but noticeable perturbations to the input –
especially if a model has been designed to resist small perturbations.


Many ML models operate without human validation of every input and thus malicious attackers may
not be concerned with the imperceptibility of their attacks. Even if humans are able to notice these
patches, they may not understand the intent of the patch and instead view it as a form of art. This
work shows that focusing only on defending against small perturbations is insufficient, as large, local
perturbations can also break classifiers.


Adversarial Patch [Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi and Justin Gilmer/Neural Information Processing Systems 2017]


These psychedelic stickers blow AI minds [Devin Coldewey/Techcrunch]


(via /.)