The "universal adversarial preturbation" undetectably alters images so AI can't recognize them

In a newly revised paper in Computer Vision and Pattern Recognition, a group of French and Swiss computer science researchers show that "a very small perturbation vector that causes natural images to be misclassified with high probability" -- that is, a minor image transformation can beat machine learning systems nearly every time.

What's more, the researchers show evidence that similar tiny distortions exist in other kinds of data-types that confound all machine-learning systems! The "universal adversarial preturbation" changes images in ways that are imperceptible to the human eye, but devastating to machine vision.

The research is typical of the early phases of computer science breakthroughs that demonstrate incredible results in classifying systems that no one is trying to game. Think of when Google figured out that the links between webpages -- only ever made as an expression of interest -- could be used to classify the importance of web-pages, only to kick of an arms race as attackers of the system figured out that they could get high rankings with easy-to-maintain linkfarms. The same thing happened when stylometry begat adversarial stylometry.

All of success in using deep learning classification to identify people, or catch cheaters, has assumed that the other side never deploys any countermeasures. This paper suggests that such countermeasures are trivial to generate and devastating in practice.

* We show the existence of universal image-agnostic perturbations for state-of-the-art deep neural networks. • We propose an algorithm for finding such perturbations. The algorithm seeks a universal perturbation for a set of training points, and proceeds by aggregating atomic perturbation vectors that send successive datapoints to the decision boundary of the classifier.

• We show that universal perturbations have a remarkable generalization property, as perturbations computed for a rather small set of training points fool new images with high probability.

• We show that such perturbations are not only universal across images, but also generalize well across deep neural networks. Such perturbations are therefore doubly universal, both with respect to the data and the network architectures.

• We explain and analyze the high vulnerability of deep neural networks to universal perturbations by examining the geometric correlation between different parts of the decision boundary. Universal adversarial perturbations [Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi and Pascal Frossard/ Computer Vision and Pattern Recognition]

(via JWZ)

Notable Replies

  1. I know it acting smug feels nice, but 1) This is absolutely not true, and 2) in many cases irrelevant -- for instance, if I am using a neural network to categorize my photos, there is no adversarial relationship -- I want the algorithm to work. It also doesn't sound like this would work to e.g., fool a security camera unless I somehow had access to the digital data.

    Trivial maybe in the mathematical sense where anything that is proven becomes trivial, but it clearly took a lot of work to do this, and lots of people failed before -- one of the reasons captchas have been going away is that it was becoming harder and harder to distort images in a way that would fool image recognition but humans could still understand. So it is a problem that people have worked on, with clear financial incentive, for quite some time. A success now doesn't mean the problem was trivial, or that it is solved.

    In the long run, I wouldn't bet on anything like this being effective. There is a tremendous amount of research in identifying perceptible vs. imperceptible features of images for data compression. If you put imperceptible errors in an image that confuse a deep neural network, and nobody can find a way to train the networks to be more robust against that, someone is going to find a way to filter that out before analyzing it. It will probably cost some accuracy but that isn't necessarily a deal breaker.

    In particular, one of the key things that makes this attack interesting is that it generalized between different neural networks with different design. This is important, because if you want to post pictures to facebook without facebook being able to analyze them, you don't have access to their classifier to train your distortion algorithm on. My guess is that the reason it is so effective is that everybody is using the same training data -- there are just not that many sets of millions of public images with pre-existing labels.

  2. You may have preturbed the word 'perturbation' but I still recognized it!

  3. doop says:

    I'm not sure about the details of this technique, but there are other exploits that are highly resistant to such countermeasures. From "Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images":

    We also find that, for MNIST DNNs, it is not easy to prevent the DNNs from being fooled by retraining them with fooling images labeled as such. While retrained DNNs learn to classify the negative examples as fooling images, a new batch of fooling images can be produced that fool these new networks, even after many retraining iterations.

    Because the current generation of neural nets are rigid once trained, and because they return their verdicts along with a confidence value, they are by their nature susceptible to evolutionary techniques that iteratively mutate the input to produce incremental gains in the output. Fixing the exploit would mean changing fundamentally what it is that neural networks do, and how they are used.

    Done. You want "Intriguing properties of neural networks" C. Szegedy et al. 2013, which "revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library)". The only thing new in this study is that state-of-the-art neural nets are still so dumb that they can be consistently fooled by a static transform applied to any input data.

    I can't even begin to imagine what heuristics a net might use that allow it to classify static as a coherent image, but I guess the space of images that we would consider "just static" is absolutely vast in comparison to the space of images that look to us like a thing. To today's neural nets though, it all looks roughly the same. They have no context, they don't know what the world is or what "things" are and it's not clear that showing them a series of representational bitmaps is going to get them there. Humans know a few basic concepts such as "things have a form", and "some aspect of that form may, at times, be visible," which is hugely helpful context for looking at pictures but it's clear on the evidence of studies such as these that neural nets really don't get that. They don't know what pictures are to start with.

    Here's an image recognition task that might help illustrate what I'm talking about.

    Spoiler here if you're in a hurry. The picture gives you almost nothing to go on but once you know what the thing is, you can't unsee it anymore. You know the form of the thing and you recognise the visible aspect, and you reconstruct in your mind an analogue of the scene that the image represents. You understand the representational nature of the image, of images, and you see past it because you live in the world and you've walked around in it a bit and maybe picked up a thing and put it down again. The outputs of your neural net go out into the world and flow back to your inputs, allowing you to explore and test and build in your mind a coherent model of the world you inhabit. By comparison it looks like image recognition algorithms are computing epicycles because they don't understand gravity.

    Neural nets are a fantastic invention that will continue to transform the world in many beneficial ways, but I think in their current state there are serious problems deploying them in any sort of adversarial situation. These findings add to the carnage because if simply exposing a neural net's confidence values is a security risk, that's likely to be used by people running shitty AI as an excuse to dodge audits.

Continue the discussion

23 more replies