Robin Sloan is a programmer and novelist whose books like Sourdough and Mr Penumbra's 24-Hour Bookstore are rich and evocative blends of self-aware nerdy playfulness and magical speculation. Read the rest
A presentation today at Defcon from Drexel computer science prof Rachel Greenstadt and GWU computer sicence prof Aylin Caliskan builds on the pair's earlier work in identifying the authors of software and shows that they can, with a high degree of accuracy, identify the anonymous author of software, whether in source-code or binary form. Read the rest
The famously spare Hemingway used 80 words ending in -ly per 10,000 words of prose; JK Rowling uses 140 adverbs per 10,000 words, and EL James uses 155. Read the rest
Gizmodo's Ashley Feinberg (almost certainly) figured out that James Comey's secret Twitter handle was @projectexile7, because America's top G-man failed at some of the most basic elements of operational security. Read the rest
One of the most interesting technical presentations I attended in 2012 was the talk on "adversarial stylometry" given by a Drexel College research team at the 28C3 conference in Berlin. "Stylometry" is the practice of trying to ascribe authorship to an anonymous text by analyzing its writing style; "adversarial stylometry" is the practice of resisting stylometric de-anonymization by using software to remove distinctive characteristics and voice from a text.
Stanford's Arvind Narayanan describes a paper he co-authored on stylometry that has been accepted for the IEEE Symposium on Security and Privacy 2012. In On the Feasibility of Internet-Scale Author Identification (PDF) Narayanan and co-authors show that they can use stylometry to improve the reliability of de-anonymizing blog posts drawn from a large and diverse data-set, using a method that scales well. However, the experimental set was not "adversarial" -- that is, the authors took no countermeasures to disguise their authorship. It would be interesting to see how the approach described in the paper performs against texts that are deliberately anonymized, with and without computer assistance. The summary cites another paper by someone who found that even unaided efforts to disguise one's style makes stylometric analysis much less effective.
Read the rest
We made several innovations that allowed us to achieve the accuracy levels that we did. First, contrary to some previous authors who hypothesized that only relatively straightforward “lazy” classifiers work for this type of problem, we were able to avoid various pitfalls and use more high-powered machinery. Second, we developed new techniques for confidence estimation, including a measure very similar to “eccentricity” used in the Netflix paper.