Prooffreader graphed the distribution of letters towards the beginning, middle and end of English words, using a variety of corpora, finding both some obvious truths and some surprising ones. As soon as I saw this, I began to think of the ways that you could use it to design word games -- everything from improved Boggle dice to automated Hangman strategies to altogether new games.
Now then: I became curious about how letters are placed in English while doing many different, often quick, sometimes pointless, pattern analyses of letters for a wide variety of reasons. (One example: for one art project that will hopefully be posted on this blog one day, I found all the anagrams of "Hollywood", and noticed that words beginning with "w" were overrepresented.)
I've had many "oh, yeah" moments looking over the graphs. For example, words almost never begin with "x", but it's quite common as the second letter. There's a little hump near the beginning of "u" that's caused by its proximity to "q", which is most common at the beginning of a word. When you remove "q" from the dataset, the hump disappears. "F" occurs toward the extremes, especially in prepositions ("for", "from", "of", "off") but rarely just before the middle.
A final thought: the most common word in the English language is "the", which makes up about 6% of most corpuses (sorry, corpora). But according to these graphs, the most representative word is "toe".
Graphing the distribution of English letters towards the beginning, middle or end of words
(via Hacker News)
I’ve been noting humorous updatings of Ambrose Bierce’s 1906 humor classic The Devil’s Dictionary for years — there was the publishing edition, and this corker on copyright — but the Educational Technology edition, by New Storytelling author Bryan Alexander has a currency and an urgency that scores an acerbic bullseye.
Veracrypt was created to fill the vacuum left by the implosion of disk-encryption tool Truecrypt, which mysteriously vanished in 2014, along with a “suicide note” (possibly containing a hidden message) that many interpreted as a warning that an intelligence agency had inserted a backdoor into the code, or was attempting to force Truecrypt’s anonymous creators […]
Yahoo has released a machine-learning model called open_nsfw that is designed to distinguish not-safe-for-work images from worksafe ones. By tweaking the model and combining it with places-CNN, MIT’s scene-recognition model, Gabriel Goh created a bunch of machine-generated scenes that score high for both models — things that aren’t porn, but look porny.
I’ve never really felt the need to purchase a smartwatch because a lot of them aren’t very functional, but at just shy of $30, the Martian Notifier Smartwatch was worth checking out. For that low of a price, it actually does feature an impressive amount of functionality, and comes in handy when you don’t want to be carrying around your […]
Geek Fuel is a subscription delivery service that caters to those of us that love comics, gaming, and general geek culture. Every month, Geek Fuel will assemble a box of goodies with a value of $50 or over. The specific items are a mystery, but you’ll always get an exclusive t-shirt not found anywhere else, a full […]
If you like to DIY and you like helicopters, you’re going to really love the Flexbot Hexacopter Kit. This copter blows traditional models out of the water: it includes everything you need to actually build your own hexacopter, and then pilot it like a pro, too.The construction is complicated enough to give you a challenge, […]