State of Adversarial Stylometry: can you change your prose-style?

Today at the Chaos Computer Congress in Berlin (28C3), Sadia Afroz and Michael Brennan presented a talk called "Deceiving Authorship Detection," about research from Drexel College on "Adversarial Stylometry," the practice of identifying the authors of texts who don't want to be identified, and the process of evading detection. Stylometry has made great and well-publicized advances in recent years (and it made the news with scandals like "Gay Girl in Damascus"), but typically this has been against authors who have not taken active, computer-assisted countermeasures at disguising their distinctive "voice" in prose.

As part of the presentation, the Drexel Team released Anonymouth, a free/open tool that partially automates the process of evading authorship detection. The tool is still a rough alpha, and it requires human intervention to oversee the texts it produces, but it is still an exciting move in adversarial stylometry tools. Accompanying the release are large corpuses of test data of deceptive and non-deceptive texts.

Stylometry has been cited by knowledgeable critics as proof of the pointlessness of the Nym Wars: why argue for the right to be anonymous or pseudonymous on Google Plus or Facebook when stylometry will de-anonymize you anyway? I've been suspect of these critiques because they assume that only de-anonymizers will have access to computer-assisted tools, but as Anonymouth shows, there are many opportunities to use automation tools to improve anonymity.

Stylometry matters in many ways: its state of the art changes the balance of power between trolls and moderators, between dissidents and dictators, between employers and whistleblowers, between astroturfers and commenters, and between spammers and filters.

During the Q&A, a questioner asked whether Anonymouth's methods could be used by, say, fanfic authors to make their writing style match the author whose universe they're dabbling in; the researchers thought this would be so. I instantly wondered if avid fans might make a JK-Rowlingifier that could be used by dissidents to anonymize their speech, homogenizing it to pitch-perfect Potterian English so that stylometry fails. And of course, this makes me wonder whether stylometry could be used to falsely identify a block of prose with a third party (making a terrorist rant stylometrically match an innocent's prose-style) -- the researchers doubt this, and suggest that when deception is a possibility, prose-style shouldn't be considered as identifying evidence.

As an aside, the Anonymouth team is part of a lab at Drexel seeking grad-students and postdocs.

Privacy, Security and Automation Lab

17

  1. I’m doubtful that this will be useful in the next ten to twenty years without massive human input; computers currently have enough difficulty parsing complex sentences on their own, much less rephrasing them in a less-identifiable manner. 

    Spotting identifiable passages and asking you to rewrite them, now, that’s much more feasible.  It’s going to take work to anonymize one’s own writing.

    This also means I’m similarly dubious about its ability to change your writing to mimic another person’s, again at least in the near-ish future.

  2. “Stylometry has been cited by knowledgeable critics as proof of the pointlessness of the Nym Wars: why argue for the right to be anonymous or pseudonymous on Google Plus or Facebook when stylometry will de-anonymize you anyway?”

    I look forward (?) to the world where, as well as fingerprints, authorities ask you to submit an essay to aid in future identification.  If all you ever write is your (pseudonymous) blog, they have nothing to compare your writing against.  So it’s hard to claim that said blog’s author might as well just put their name on it as they rail against the Iranian regime.

    Then again, maybe they kept all my highschool essays…

  3. Maybe such a tool could also be used to translate youtube comments into proper English  with impeccably-spelled complete sentences?

  4. This has been something I’ve worried about for some time: I know from experience that my writing style is detectable by some who know me in about one line of chat, even when I’m trying to conceal that I have made a new identity, and they have no reason to suspect that I have done so.

    I also downloaded Freenet once, and anonymously posted a single paragraph answering a help request on the Frost boards — I was very careful to give no personally identifiable information. Yet I had a “Hey, Dewi, is that you on Freenet?” PM from a friend within the hour. Experiment failed.

    This is why I no longer bother concealing my real name on the net, or participating in anonymous communities: I am simply very, very bad at not being myself – and I think that is a skill that will become increasingly important for anonymity, free speech and privacy in the future.

    I get the feeling that this inability has prevented me from exploring much of the awesomeness that is the internet. I’m very glad that there are stylistic prostheses being developed.

    1. I still think it is a useful skill to have though, to be able to change writing style. Keep practicing, I think you’ll get it. Just try writing in the style of a specific author for an entire day, even if it’s not the style you intend to write in ever again it’ll be good practice.

  5. The process is not hard.

    Analyse source text

    Identify symbols outside the range of target text

    Permute through nearby symbols in a synonymical database and replace source symbols

    Rinse repeat

    You can even build a b-tree of potential targets and trim and walk it.

  6. Aside from the more useful purposes of this, it would be fun for MMOs. Make it so people always talk in character depending on the character’s origin they decided to be.

    1. Thou mayest a change in styling of thy prose effect, good Sir, yet it may happen that thy efforts do not any certain result attain, to be sure.

  7. “It might be entertaining to run J.K.Rowling’s work through it, in the hope of coming up with some decent prose!” snorted Marktech controversially.

  8. “Stylometry matters in many ways: its state of the art changes the balance of power between trolls and moderators, between dissidents and dictators, between employers and whistleblowers, between astroturfers and commenters, and between spammers and filters.”

    I can’t help but wonder (and be made extremely uncomfortable) by this whole “balance of power” idea.  “Power” to do what?  Mod away plain speaking when the plainness goes against the party line grain?  Silence freedom of speech when that same speech makes or wins a point that is not held by someone in power?  A recent discussion (yesterday, on the boy toys vs girl toys thread) makes it appear that the majority of people here are in favor of advertising and manipulation, when the manipulation is of the “correct” kind.  Such “end justifies the means”ers scare the crap out of me, regardless of their philosophical or political stripe.

    Sexism and prejudice are seen as repugnant, but I’ve only today (for the first time, personally) encountered the word “mansplaining” here on boingboing a number of times, and it, though quite offensive, prejudiced and sexist, seems to be perfectly okay.  Again, the increasing prevalence of the idea that “it’s okay to hate them, since they deserve to be hated”, and the smug / intellectually self-satisfied attitude it bespeaks (on the part of educated people, no less) is alarming.

    Socrates wouldn’t need to drink his hemlock in the 21st century: he’d be accused of being a “concern troll”, and told to get back under his bridge.  And the way things are going, he wouldn’t even have to be called anything, since no one would even get to see what he said in the first place: pretty soon an automod will be able to detect his post and delete it at the first mention of “erring”, “wrongdoing”, “virtue” or “knowledge” within the same paragraph.

    Fairly soon, “men (and women) of good will” will have have no recourse but to take a page out of John Fletcher’s book (or, alternately, that of Megaforce): “Deeds, not words shall speak me”.

  9. Cory, you might be interested in the recent history of pro skateboarder Rodney Mullen. After receiving news of impending physical problems due to his career, he spent a couple of year re-engineering his stance on the board and tricks in order to minimize physical repercussions. I’m not sure if this stuff is in his book.

  10. 1) Isn’t style part of the content of a message? Maybe this is not important to people who really need to remain anonymous, but I would think that part of the power of political speech is the style in which it’s written.  If you divorce the content from its style sufficiently to obfuscate the author, might you not also divorce it of some of its power and persuasiveness?

    2) I’m not going to argue that some people don’t really need to remain anonymous, but I do think the idea is kind of sad that for speech to remain free it *must* be anonymous.   I’m trying to imagine Martin Luther King Jr’s speeches being as powerful, read by a synthetic reader and anonymized in such a way as to protect his identity.  Perhaps he’d still be alive, but I don’t think his message would have been as powerful.

Comments are closed.