Today at the Chaos Computer Congress in Berlin (28C3), Sadia Afroz and Michael Brennan presented a talk called "Deceiving Authorship Detection," about research from Drexel College on "Adversarial Stylometry," the practice of identifying the authors of texts who don't want to be identified, and the process of evading detection. Stylometry has made great and well-publicized advances in recent years (and it made the news with scandals like "Gay Girl in Damascus"), but typically this has been against authors who have not taken active, computer-assisted countermeasures at disguising their distinctive "voice" in prose.
As part of the presentation, the Drexel Team released Anonymouth, a free/open tool that partially automates the process of evading authorship detection. The tool is still a rough alpha, and it requires human intervention to oversee the texts it produces, but it is still an exciting move in adversarial stylometry tools. Accompanying the release are large corpuses of test data of deceptive and non-deceptive texts.
Stylometry has been cited by knowledgeable critics as proof of the pointlessness of the Nym Wars: why argue for the right to be anonymous or pseudonymous on Google Plus or Facebook when stylometry will de-anonymize you anyway? I've been suspect of these critiques because they assume that only de-anonymizers will have access to computer-assisted tools, but as Anonymouth shows, there are many opportunities to use automation tools to improve anonymity.
Stylometry matters in many ways: its state of the art changes the balance of power between trolls and moderators, between dissidents and dictators, between employers and whistleblowers, between astroturfers and commenters, and between spammers and filters.
During the Q&A, a questioner asked whether Anonymouth's methods could be used by, say, fanfic authors to make their writing style match the author whose universe they're dabbling in; the researchers thought this would be so. I instantly wondered if avid fans might make a JK-Rowlingifier that could be used by dissidents to anonymize their speech, homogenizing it to pitch-perfect Potterian English so that stylometry fails. And of course, this makes me wonder whether stylometry could be used to falsely identify a block of prose with a third party (making a terrorist rant stylometrically match an innocent's prose-style) -- the researchers doubt this, and suggest that when deception is a possibility, prose-style shouldn't be considered as identifying evidence.
As an aside, the Anonymouth team is part of a lab at Drexel seeking grad-students and postdocs.
Privacy, Security and Automation Lab