Prooffreader graphed the distribution of letters towards the beginning, middle and end of English words, using a variety of corpora, finding both some obvious truths and some surprising ones. As soon as I saw this, I began to think of the ways that you could use it to design word games -- everything from improved Boggle dice to automated Hangman strategies to altogether new games.
Now then: I became curious about how letters are placed in English while doing many different, often quick, sometimes pointless, pattern analyses of letters for a wide variety of reasons. (One example: for one art project that will hopefully be posted on this blog one day, I found all the anagrams of "Hollywood", and noticed that words beginning with "w" were overrepresented.)
I've had many "oh, yeah" moments looking over the graphs. For example, words almost never begin with "x", but it's quite common as the second letter. There's a little hump near the beginning of "u" that's caused by its proximity to "q", which is most common at the beginning of a word. When you remove "q" from the dataset, the hump disappears. "F" occurs toward the extremes, especially in prepositions ("for", "from", "of", "off") but rarely just before the middle.
A final thought: the most common word in the English language is "the", which makes up about 6% of most corpuses (sorry, corpora). But according to these graphs, the most representative word is "toe".
Graphing the distribution of English letters towards the beginning, middle or end of words
(via Hacker News)
In a new paper published in the peer-reviewed Journal of Ethnicity in Criminal Justice, sociologists and criminologists from University at Buffalo (SUNY), the University of Alabama, Kennesaw State University, the State of Georgia, and Georgia State University review 40 years’ worth of FBI data on violent crimes and property crimes, correlating this data series with […]
Address space layout randomization is an important first line of defense against malicious software: by randomizing where in memory instructions are stored, ASLR makes it much harder to overwrite memory with new code that will be jumped to as a program executes, offering significant protection against buffer overflow attacks.
In Out of Control: Ransomware for Industrial Control Systems, three Georgia Tech computer scientists describe their work to develop LogicLocker, a piece of proof-of-concept ransomware that infects the programmable logic controllers that are used to control industrial systems like those in power plants.
Python is immensely popular in the data science world for the same reason it is in most other areas of computing—it has highly readable syntax and is suitable for anything from short scripts to massive web services. One of its most exciting, newest applications, however, is in machine learning. You can dive into this booming […]
Learning new skills is a great way to improve your resume and stand out from other candidates. Especially in a workforce in which many job-seekers have a wide variety of qualifications. With lifetime access to Virtual Training Company, you won’t have to choose a specific focus. You can pick up new expertise whenever you deem it […]
Instead of throwing out all the empties after your next party, why not transform them into some new DIY glassware? Cut back on waste and add some home ambiance with the Kinkajou Bottle Cutter and Candle Making Kit.The Kinkajou is designed as a clamp-on scoring blade to make precise cuts. Just slide a bottle in, tighten […]