The CIA writes like Lovecraft, Bureau of Prisons is like Stephen King, & NSA is like...


Michael from Muckrock writes, "When MuckRock stumbled on I Write Like - a service that lets you see which famous author a given piece of writing resembles - they immediately knew what it was destined for: Helping shed light on on the literary influences of the mysterious FOIA offices they deal with on a daily basis. Fittingly, some offices echo HP Lovecraft's dark horror, while others are more Dan Brown. But you'll never guess which agency seems to take a cue from Cory Doctorow ..." Read the rest

How to send email like a non-metaphorical boss


When Enron collapsed and got hit with a lawsuit requesting discovery on its internal email, its top bosses decided that they'd skip spending money on pricey lawyers to go through the archive and remove immaterial messages -- instead, the dumped the entire corpus of internal mail, including their employees' personal messages. Read the rest

Thomas Friedman: IHOP menu copywriter

Something Awful celebrates the deathless prose of Thomas Friedman and the mountains of empty calories on offer at the International House of Pancakes -- Friedman's culinary equivalent -- by giving us notional menu copy as written by the Great Flat One. Read the rest

Scalable stylometry: can we de-anonymize the Internet by analyzing writing style?

One of the most interesting technical presentations I attended in 2012 was the talk on "adversarial stylometry" given by a Drexel College research team at the 28C3 conference in Berlin. "Stylometry" is the practice of trying to ascribe authorship to an anonymous text by analyzing its writing style; "adversarial stylometry" is the practice of resisting stylometric de-anonymization by using software to remove distinctive characteristics and voice from a text.

Stanford's Arvind Narayanan describes a paper he co-authored on stylometry that has been accepted for the IEEE Symposium on Security and Privacy 2012. In On the Feasibility of Internet-Scale Author Identification (PDF) Narayanan and co-authors show that they can use stylometry to improve the reliability of de-anonymizing blog posts drawn from a large and diverse data-set, using a method that scales well. However, the experimental set was not "adversarial" -- that is, the authors took no countermeasures to disguise their authorship. It would be interesting to see how the approach described in the paper performs against texts that are deliberately anonymized, with and without computer assistance. The summary cites another paper by someone who found that even unaided efforts to disguise one's style makes stylometric analysis much less effective.

We made several innovations that allowed us to achieve the accuracy levels that we did. First, contrary to some previous authors who hypothesized that only relatively straightforward “lazy” classifiers work for this type of problem, we were able to avoid various pitfalls and use more high-powered machinery. Second, we developed new techniques for confidence estimation, including a measure very similar to “eccentricity” used in the Netflix paper.

Read the rest