Researchers can fool machine-learning vision systems with a single, well-placed pixel

Three researchers from Kyushu University have published a paper describing a means of reliably fooling AI-based image classifiers with a single well-placed pixel.


It's part of a wider field of "adversarial preturbation" to disrupt machine-learning models; it's a field that started with some modest achievements, but has been gaining ground ever since.

But the Kyushu paper goes further than any of the research I've seen so far. The researchers use 1, 3 or 5 well-placed pixels to fool a majority of machine-classification of images, without having any access to the training data used to produce the model (a "black box" attack).

It's a good example of how impressive gains in "non-adversarial" computing (something works if no one is trying to stop it) are actually extremely fragile and almost trivial to circumvent once they exist.


According to the experimental results, the main contributions
of our work include:


• The effectiveness of conducting non-target attack
using few-pixel attack. We show that with only 1
pixel modification, there are 73.8% of the images can
be perturbed to one or more target classes, 82.0% and
87.3% in the cases of 3 and 5-pixel attacks. We show
the non-sensitive images are even much rarer than sensitive
images even if limiting the perturbation to such a
small scope, therefore few-pixel modification is an effective
method of searching adversarial images while
can be hardly recognized by human eyes in practice.

• The number of target classes that a natural image
can camouflage. In the case of 1 pixel perturbation
, each natural image can be perturbed to 2.3 other
classes on average. In specific, there are 18.4% , 17.2%
and 16.6% of the images can be perturbed to 1, 2, 3
target classes. In the case of 5-pixel perturbation, the
amounts of images that can be perturbed to from 1 to
9 target classes become almost even.

• Similar perturbation direction to a specific target
class. Effectiveness of universal perturbation has
shown that many images can be perturbed through similar
directions such that decision boundaries might leak
diversity [24] while our results show that data-points
belonging to same classes can be always more easily
perturbed to specific classes with same amount of perturbations
(i.e. 1, 3 or 5 pixel-modifications).

• Geometrical understanding of data-point distribution
in high dimensional input space. Geometrically,
the information obtained by conducting few-pixel attack
can be also regarded as a quantitative result on
changes in class labels on the cross sections obtained
by using simply low dimensional slices to cut the input
space. In particular, our results indicate that some
decision domains might have very great depths towards
many different directions but inside these deep
areas, the decision domains are quite narrow. In other
words, these domains can have many long and thin extended
synapses towards different directions in the input
space.


One pixel attack for fooling deep neural networks [Jiawei Su, Danilo Vasconcellos Vargas and Sakurai Kouichi/Arxiv]

(via 4 Short Links)