Prooffreader graphed the distribution of letters towards the beginning, middle and end of English words, using a variety of corpora, finding both some obvious truths and some surprising ones. As soon as I saw this, I began to think of the ways that you could use it to design word games -- everything from improved Boggle dice to automated Hangman strategies to altogether new games.
Now then: I became curious about how letters are placed in English while doing many different, often quick, sometimes pointless, pattern analyses of letters for a wide variety of reasons. (One example: for one art project that will hopefully be posted on this blog one day, I found all the anagrams of "Hollywood", and noticed that words beginning with "w" were overrepresented.)
I've had many "oh, yeah" moments looking over the graphs. For example, words almost never begin with "x", but it's quite common as the second letter. There's a little hump near the beginning of "u" that's caused by its proximity to "q", which is most common at the beginning of a word. When you remove "q" from the dataset, the hump disappears. "F" occurs toward the extremes, especially in prepositions ("for", "from", "of", "off") but rarely just before the middle.
A final thought: the most common word in the English language is "the", which makes up about 6% of most corpuses (sorry, corpora). But according to these graphs, the most representative word is "toe".
Graphing the distribution of English letters towards the beginning, middle or end of words
(via Hacker News)
Gastric bypass surgery is remarkably effective at promoting weight-loss (it cuts the long-term risk of early death from morbid obesity by 40%), and it’s long been presumed that the major action by which it worked was that, by bypassing the parts of the gut where most food absorbtion takes place, it limited the calories that […]
Marta Chudolinska is Learning Zone Librarian at the Ontario College of Art and Design University, which hosts a huge zine collection founded in 2007 Alicia Nauta, then a student.
The famously spare Hemingway used 80 words ending in -ly per 10,000 words of prose; JK Rowling uses 140 adverbs per 10,000 words, and EL James uses 155.
If you don’t want to get stuck footing the bill for a hit and run, this dashboard-mounted camera offers up to 2K resolution to make sure you always have a reliable witness, and it’s available in the Boing Boing Store for 30% off it’s usual price.The PapaGo mounts unobtrusively to your windshield to see everything […]
While some people still maintain that everything in Apple’s walled garden “just works” and is immune to the rampant malware of the Windows world, the reality is different. The Mac’s growing market share has made it a much more viable target for malicious actors, and its built-in tools aren’t always enough to fix things. Drive […]
Boasting an IPX6 waterproof rating, the Trakk Bullet Ultra Compact Waterproof Bluetooth Speaker resists dust and heavy rainfall. It’s currently available in the Boing Boing Store.The Trakk Bullet offers the same wireless convenience as other portable speakers, but few are built as tough as this one. Its utilitarian construction is designed to be a totally low-maintenance […]