Prooffreader graphed the distribution of letters towards the beginning, middle and end of English words, using a variety of corpora, finding both some obvious truths and some surprising ones. As soon as I saw this, I began to think of the ways that you could use it to design word games -- everything from improved Boggle dice to automated Hangman strategies to altogether new games.
Now then: I became curious about how letters are placed in English while doing many different, often quick, sometimes pointless, pattern analyses of letters for a wide variety of reasons. (One example: for one art project that will hopefully be posted on this blog one day, I found all the anagrams of "Hollywood", and noticed that words beginning with "w" were overrepresented.)
I've had many "oh, yeah" moments looking over the graphs. For example, words almost never begin with "x", but it's quite common as the second letter. There's a little hump near the beginning of "u" that's caused by its proximity to "q", which is most common at the beginning of a word. When you remove "q" from the dataset, the hump disappears. "F" occurs toward the extremes, especially in prepositions ("for", "from", "of", "off") but rarely just before the middle.
A final thought: the most common word in the English language is "the", which makes up about 6% of most corpuses (sorry, corpora). But according to these graphs, the most representative word is "toe".
Graphing the distribution of English letters towards the beginning, middle or end of words
(via Hacker News)
NYU PhD candidate Kevin Munger made a set of four male-seeming twitterbots that attempted to “socially sanction” white Twitter users who habitually used racial epithets (he reasons that these two characteristics are a good proxy for harassment): the bots could be white or black (that is, have names that have been experimentally shown to be […]
Michèle B. Nuijten and co’s statcheck program re-examines the datasets in peer-reviewed science and flags anomalies that are associated with fakery, from duplication of data to internal inconsistencies.
In 2002, a peer-reviewed article in the Journal of Personality and Social Psychology claimed that men named “Dennis” were more likely to become dentists; people named “George” or “Georgina” were apt to become geologists; and people with surnames like “Diamond” and “Ricci” were more likely to become bankers.
Loot Crate is a totally different kind of subscription service that mails subscribers monthly boxes filled with curated geek, pop culture, and gamer paraphernalia. Its cult following awaits a box every month filled with everything from bobble heads to T-shirts to special edition collectibles. But nothing gets Loot Crate fans as excited as the limited […]
The ARMOR-X Mini Flexible Phone Tripod is a smartphone tripod that is designed with flexible legs to rest on virtually any type of surface. Other tripods have proved useless unless I conveniently have a flat surface in front of me, which is why this particular tripod was appealing enough to try out. The ARMOR-X is compact and easy […]
You don’t need to get an advanced degree and take out massive loans to become a coder. This bundle of 10 courses was designed to teach anyone to code at home for less than it costs to go out for dinner. I was particularly impressed with this new 2017 bundle because it includes courses on […]