Prooffreader graphed the distribution of letters towards the beginning, middle and end of English words, using a variety of corpora, finding both some obvious truths and some surprising ones. As soon as I saw this, I began to think of the ways that you could use it to design word games -- everything from improved Boggle dice to automated Hangman strategies to altogether new games.
Now then: I became curious about how letters are placed in English while doing many different, often quick, sometimes pointless, pattern analyses of letters for a wide variety of reasons. (One example: for one art project that will hopefully be posted on this blog one day, I found all the anagrams of "Hollywood", and noticed that words beginning with "w" were overrepresented.)
I've had many "oh, yeah" moments looking over the graphs. For example, words almost never begin with "x", but it's quite common as the second letter. There's a little hump near the beginning of "u" that's caused by its proximity to "q", which is most common at the beginning of a word. When you remove "q" from the dataset, the hump disappears. "F" occurs toward the extremes, especially in prepositions ("for", "from", "of", "off") but rarely just before the middle.
A final thought: the most common word in the English language is "the", which makes up about 6% of most corpuses (sorry, corpora). But according to these graphs, the most representative word is "toe".
Graphing the distribution of English letters towards the beginning, middle or end of words
(via Hacker News)
Robbo writes, “A number of so-called scientific journals have accepted a Star Wars-themed spoof paper. The manuscript is an absurd mess of factual errors, plagiarism and movie quotes. We know this because Neuroskeptic wrote it and posted about it on the Discover Magazine site. The paper was about Midi-chlorians and attributed to Dr Lucas McGeorge […]
Alice and Bob are the hypothetical communicants in every cryptographic example or explainer, two people trying to talk with one another without being thwarted or overheard by Eve, Mallory and their legion of nefarious friends.
A team of public health researchers studies mosquito populations in neighborhoods in Baltimore, looking for correlation between socioeconomic status and mosquitoes.
Whether you’re a seasoned entertainment industry veteran or a student working on your first spec script, having the right tool for the job will make a huge difference in your focus and productivity.Final Draft 10 is far and away the world’s best screenwriting software, used extensively by professional film and TV writers at top production […]
Web content creators who don’t have a solid SEO strategy should take note of Webtexttool. It’s a service that pulls in anonymous data from their entire user base to offer crowdsourced guidance that increases your search page ranks. By analyzing prior user successes, it helps you better gauge how your posts will perform at a […]
Just because English has become the common global tongue doesn’t mean it’s the easiest language to write—even for native speakers. If you’re looking to improve your written communication skills, especially on your smartphone, take a look at Ginger Page.Ginger is a cross-platform app that offers corrections for phrasing as well as grammar. It’s powered by […]