Pixel-counting can un-redact government docs

A Luxembourgian/Irish security research team have presented a paper on a technique for identifying words that have been blacked out of documents, as when government docs are published with big strikethroughs over the bits that are sensitive to national security. The technique doesn't work on monospace fonts like Courier, but the State Department's recent font guidelines require that all docs be published in Times New Roman, which decodes like a charm.

hey found the number of pixels that had been blacked out in the sentence: "An Egyptian Islamic Jihad (EIJ) operative told an xxxxxxxx service at the same time that Bin Ladin was planning to exploit the operative's access to the U.S. to mount a terrorist strike." They then used a computer to determine the pixel length of words in the dictionary when written in the Arial font.

The program rejected all of the words that were not within three pixels of the length of the word that was probably under the blacked-out area in the document.

The software then reduced the number of possible words to just seven from 1,530 by using semantic guidelines, including the grammatical context. The researchers selected the word "Egyptian" from the seven possible words…

Link

(Thanks, Wendy!)

Update: This page at Cryptome has more detail and illustrations (Thanks, Chris!)