Prooffreader graphed the distribution of letters towards the beginning, middle and end of English words, using a variety of corpora, finding both some obvious truths and some surprising ones. As soon as I saw this, I began to think of the ways that you could use it to design word games -- everything from improved Boggle dice to automated Hangman strategies to altogether new games.
Now then: I became curious about how letters are placed in English while doing many different, often quick, sometimes pointless, pattern analyses of letters for a wide variety of reasons. (One example: for one art project that will hopefully be posted on this blog one day, I found all the anagrams of "Hollywood", and noticed that words beginning with "w" were overrepresented.)
I've had many "oh, yeah" moments looking over the graphs. For example, words almost never begin with "x", but it's quite common as the second letter. There's a little hump near the beginning of "u" that's caused by its proximity to "q", which is most common at the beginning of a word. When you remove "q" from the dataset, the hump disappears. "F" occurs toward the extremes, especially in prepositions ("for", "from", "of", "off") but rarely just before the middle.
A final thought: the most common word in the English language is "the", which makes up about 6% of most corpuses (sorry, corpora). But according to these graphs, the most representative word is "toe".
Graphing the distribution of English letters towards the beginning, middle or end of words
(via Hacker News)
Yesterday’s science-by-press-release announcement that a research team had made a “breakthrough” in treating ALS thanks to funds raised in last year’s viral ice-bucket challenge turns out to be vaporware: the gene identified was already known to be implicated in ALS, but only affects 3% of cases, and the new refinement in the research suggests some […]
In this 30 minute video, Harry Brignull rounds up his work on cataloging and unpicking “Dark Patterns,” (previously) the super-optimized techniques used by online services to lure their customers into taking actions they would not make otherwise and will later regret.
Netsweeper sells “internet filtering technology” — a tool that spies on users’ internet traffic and censors some of what they see — that is used by governments to control their populations, including the government of Yemen, which uses it to block its citizens’ access to material critical of its policies.
Looks like all of your potential employers are hiring candidates with programming skills (which you don’t have). With all of the languages out there today, it’s tough to know where to start.With the Complete Front-End to Back-End Coding Bundle, you can beef your resume up in all the right places, no confusion necessary. This package of […]
Those of us who love music wish we could listen to it 24/7. But it’s impossible when we’re trying to converse with our friends, or when are swimming in the local pool.That is, until now. The KOAR Bone Conduction Bluetooth Headset, now 48% off, has changed the audio game.Made with lightweight titanium memory metal, this headset boasts patented bone conduction technology to transport sound […]
It’s one thing to enjoy dinner at home and a nice glass of Cabernet Sauvignon with your best friend, Netflix, but it’s another thing entirely to make that meal from scratch and get that wine delivered right to your doorstep.But what if we told you there’s a way to make this possible? To keep your social life, […]