stylometry

Robin "Sourdough" Sloan is using a machine-learning autocomplete system to write his next novel

Robin Sloan is a programmer and novelist whose books like Sourdough and Mr Penumbra's 24-Hour Bookstore are rich and evocative blends of self-aware nerdy playfulness and magical speculation. Read the rest

Anonymous stock-market manipulators behind $20B+ of "mispricing" can be tracked by their writing styles

In a new Columbia Law and Economics Working Paper, Columbia Law prof Joshua Mitts uses "stylometry" (previously) to track how market manipulators who publish false information about companies in order to profit from options are able to flush their old identities when they become notorious for misinformation and reboot them under new handles. Read the rest

Stylistic analysis can de-anonymize code, even compiled code

A presentation today at Defcon from Drexel computer science prof Rachel Greenstadt and GWU computer sicence prof Aylin Caliskan builds on the pair's earlier work in identifying the authors of software and shows that they can, with a high degree of accuracy, identify the anonymous author of software, whether in source-code or binary form. Read the rest

The "universal adversarial preturbation" undetectably alters images so AI can't recognize them

In a newly revised paper in Computer Vision and Pattern Recognition, a group of French and Swiss computer science researchers show that "a very small perturbation vector that causes natural images to be misclassified with high probability" -- that is, a minor image transformation can beat machine learning systems nearly every time. Read the rest

It's awesome to see all these "rogue" government agency Twitter accounts, but what about hoaxes?

In the immediate aftermath of the Trump administration's gag orders on government employees disclosing taxpayer-funded research results, a series of high-profile "rogue" government agency accounts popped up on Twitter, purporting to be managed by civil servants who are unwilling to abide by the gag order. Read the rest

Hyperface: a fabric that makes computer vision systems see faces everywhere

Adam Harvey, creator of 2012's CV Dazzle project to systematically confound facial recognition software with makeup and hairstyles, presented his latest dazzle iteration, Hyperface, at the Chaos Communications Congress in Hamburg last month. Read the rest

Statcheck: a data-fakery algorithm that flagged 50,000 articles

Michèle B. Nuijten and co's statcheck program re-examines the datasets in peer-reviewed science and flags anomalies that are associated with fakery, from duplication of data to internal inconsistencies. Read the rest

Trump only writes the angry tweets, the nice ones are written by a staffer with an Iphone

On August 6, artist Todd Vaziri observed that all of Trump's angry tweets come from the Twitter client for Android, while the more presidential, less batshit ones come from an Iphone; Vaziri speculated that the latter were sent by a staffer. Read the rest

The CIA writes like Lovecraft, Bureau of Prisons is like Stephen King, & NSA is like...

Michael from Muckrock writes, "When MuckRock stumbled on I Write Like - a service that lets you see which famous author a given piece of writing resembles - they immediately knew what it was destined for: Helping shed light on on the literary influences of the mysterious FOIA offices they deal with on a daily basis. Fittingly, some offices echo HP Lovecraft's dark horror, while others are more Dan Brown. But you'll never guess which agency seems to take a cue from Cory Doctorow ..." Read the rest

How to send email like a non-metaphorical boss

When Enron collapsed and got hit with a lawsuit requesting discovery on its internal email, its top bosses decided that they'd skip spending money on pricey lawyers to go through the archive and remove immaterial messages -- instead, the dumped the entire corpus of internal mail, including their employees' personal messages. Read the rest

Scalable stylometry: can we de-anonymize the Internet by analyzing writing style?

One of the most interesting technical presentations I attended in 2012 was the talk on "adversarial stylometry" given by a Drexel College research team at the 28C3 conference in Berlin. "Stylometry" is the practice of trying to ascribe authorship to an anonymous text by analyzing its writing style; "adversarial stylometry" is the practice of resisting stylometric de-anonymization by using software to remove distinctive characteristics and voice from a text.

Stanford's Arvind Narayanan describes a paper he co-authored on stylometry that has been accepted for the IEEE Symposium on Security and Privacy 2012. In On the Feasibility of Internet-Scale Author Identification (PDF) Narayanan and co-authors show that they can use stylometry to improve the reliability of de-anonymizing blog posts drawn from a large and diverse data-set, using a method that scales well. However, the experimental set was not "adversarial" -- that is, the authors took no countermeasures to disguise their authorship. It would be interesting to see how the approach described in the paper performs against texts that are deliberately anonymized, with and without computer assistance. The summary cites another paper by someone who found that even unaided efforts to disguise one's style makes stylometric analysis much less effective.

We made several innovations that allowed us to achieve the accuracy levels that we did. First, contrary to some previous authors who hypothesized that only relatively straightforward “lazy” classifiers work for this type of problem, we were able to avoid various pitfalls and use more high-powered machinery. Second, we developed new techniques for confidence estimation, including a measure very similar to “eccentricity” used in the Netflix paper.

Read the rest

State of Adversarial Stylometry: can you change your prose-style?

Today at the Chaos Computer Congress in Berlin (28C3), Sadia Afroz and Michael Brennan presented a talk called "Deceiving Authorship Detection," about research from Drexel College on "Adversarial Stylometry," the practice of identifying the authors of texts who don't want to be identified, and the process of evading detection. Stylometry has made great and well-publicized advances in recent years (and it made the news with scandals like "Gay Girl in Damascus"), but typically this has been against authors who have not taken active, computer-assisted countermeasures at disguising their distinctive "voice" in prose.

As part of the presentation, the Drexel Team released Anonymouth, a free/open tool that partially automates the process of evading authorship detection. The tool is still a rough alpha, and it requires human intervention to oversee the texts it produces, but it is still an exciting move in adversarial stylometry tools. Accompanying the release are large corpuses of test data of deceptive and non-deceptive texts.

Stylometry has been cited by knowledgeable critics as proof of the pointlessness of the Nym Wars: why argue for the right to be anonymous or pseudonymous on Google Plus or Facebook when stylometry will de-anonymize you anyway? I've been suspect of these critiques because they assume that only de-anonymizers will have access to computer-assisted tools, but as Anonymouth shows, there are many opportunities to use automation tools to improve anonymity.

Stylometry matters in many ways: its state of the art changes the balance of power between trolls and moderators, between dissidents and dictators, between employers and whistleblowers, between astroturfers and commenters, and between spammers and filters. Read the rest

How To: Break the speed of light in your own backyard

Minute Physics serves up another nifty video.

Via Jennifer Ouellette

Video Link Read the rest

The Verge launches

The Verge, a new gadgets 'n' tech site founded by former Engadget editors and staffers, launched earlier today. Read the rest

:)