A joint UT Austin/Cornell team has taught a machine learning system based on the free/open Torch library to correctly guess the content of pixellated or blurred redactions with high accuracy: for masked faces that humans correctly guess 0.19% of the time, the system can make a correct guess 83% of the time, when given five tries.
Redaction errors have plagued data-releases since the earliest days of the net; who can forget the hilarity of companies and agencies that added black boxes in an overlay to their PDFs, or left Word's document history (including all the deleted passages) intact on their sensitive releases? Or the pedophile whose twirly-faced redaction was de-twirled to catch and prosecute him?
These days, the best practice seems to be opening the images in a bitmap editor, then replacing them with black squares.
“We’re using this off-the-shelf, poor man’s approach,” says Vitaly Shmatikov, co-author of the paper and professor at Cornell. “Just take a bunch of training data, throw some neural networks on it, throw standard image recognition algorithms on it, and even with this approach…we can obtain pretty good results.”
Shmatikov acknowledges that the Max Planck Institute’s work is more nuanced, taking into account contextual clues about identity. But he says that his simpler approach shows how weak these privacy methods really are. (He doesn’t mention that his method also is 18% more accurate in a comparable test.)
To build the attacks that identified faces in YouTube videos, researchers took publicly-available pictures and blurred the faces with YouTube’s video tool. They then fed the algorithm both sets of images, so it could learn how to correlate blur patterns to the unobscured faces. When given different images of the same people, the algorithm could determine their identity with 57% accuracy, or 85% percent when given five chances.
Defeating Image Obfuscation with Deep Learning [Richard McPherson, Reza Shokri and Vitaly Shmatikov/Arxiv]
Nothing pixelated will stay safe on the internet [Dave Gershgorn/Quartz]