Researchers can fool machine-learning vision systems with a single, well-placed pixel

Three researchers from Kyushu University have published a paper describing a means of reliably fooling AI-based image classifiers with a single well-placed pixel.

It's part of a wider field of "adversarial preturbation" to disrupt machine-learning models; it's a field that started with some modest achievements, but has been gaining ground ever since.

But the Kyushu paper goes further than any of the research I've seen so far. The researchers use 1, 3 or 5 well-placed pixels to fool a majority of machine-classification of images, without having any access to the training data used to produce the model (a "black box" attack).

It's a good example of how impressive gains in "non-adversarial" computing (something works if no one is trying to stop it) are actually extremely fragile and almost trivial to circumvent once they exist.

According to the experimental results, the main contributions of our work include:

• The effectiveness of conducting non-target attack using few-pixel attack. We show that with only 1 pixel modification, there are 73.8% of the images can be perturbed to one or more target classes, 82.0% and 87.3% in the cases of 3 and 5-pixel attacks. We show the non-sensitive images are even much rarer than sensitive images even if limiting the perturbation to such a small scope, therefore few-pixel modification is an effective method of searching adversarial images while can be hardly recognized by human eyes in practice.

• The number of target classes that a natural image can camouflage. In the case of 1 pixel perturbation , each natural image can be perturbed to 2.3 other classes on average. In specific, there are 18.4% , 17.2% and 16.6% of the images can be perturbed to 1, 2, 3 target classes. In the case of 5-pixel perturbation, the amounts of images that can be perturbed to from 1 to 9 target classes become almost even.

• Similar perturbation direction to a specific target class. Effectiveness of universal perturbation has shown that many images can be perturbed through similar directions such that decision boundaries might leak diversity [24] while our results show that data-points belonging to same classes can be always more easily perturbed to specific classes with same amount of perturbations (i.e. 1, 3 or 5 pixel-modifications).

• Geometrical understanding of data-point distribution in high dimensional input space. Geometrically, the information obtained by conducting few-pixel attack can be also regarded as a quantitative result on changes in class labels on the cross sections obtained by using simply low dimensional slices to cut the input space. In particular, our results indicate that some decision domains might have very great depths towards many different directions but inside these deep areas, the decision domains are quite narrow. In other words, these domains can have many long and thin extended synapses towards different directions in the input space.

One pixel attack for fooling deep neural networks [Jiawei Su, Danilo Vasconcellos Vargas and Sakurai Kouichi/Arxiv]

(via 4 Short Links)

Loading...