Occasioned by the publication of What If?: Serious Scientific Answers to Absurd Hypothetical Questions (see review), an outstanding profile of geek hero Randall Munroe in Rolling Stone.

How much code do you write these days?
Surprisingly, more and more. I do a lot of code just to try answer questions for myself, and sometimes the result turns into a comic. One example that did not turn into a comic — not yet, anyway — I downloaded the whole Google Books Ngrams corpus, and made some tools to visualize whether there were patterns in what years were mentioned in what years. In the 1930s, did people talk about 1776 more than in the 1980s? And you have to normalize for a bunch of things: everyone talks about the current year the most, and the number 2000 gets mentioned a lot, but how much of that was people talking about the year 2000?

I wasn't able to extract a good enough signal from that noise, but I found patterns of mentions of future years in unexpected places. In the 1930s, there was a string of mentions of the 1980s. That seemed weird to me — why are people in the Thirties talking about 50 years in the future? And it was specifically 50 years — 1932 would mention the early 1980s a lot — but the pattern didn't continue in the Forties. I started seeing other ghosts like this that were kind of inexplicable, and then I realized I was seeing OCR text recognition errors, where the 3 was misread as an 8. This was a really effective method to spot the flaws in the text recognition engine they were using. Which was cool, but wasn't I was looking for. I wrote quite a bit of code in the process, making a useful graphic that I could explore. That was a hundred lines of Python, at least, and a couple of afternoons. I do that kind of thing a lot, and occasionally it turns into a comic.

