If you've got 10 minutes, you can learn the history of English — including some interesting background on where specific words and phrases came from. (If you don't have 10 minutes, you can also watch the whole thing one chapter at a time in less-than-two-minute segments.) Interesting note: The equal importance of both The King James Bible and early scientific publications/societies to the formation of English as we speak it today.
English — along with a whole host of languages spoken in Europe, India, and the Middle East — can be traced back to an ancient language that scholars call Proto Indo-European. Now, for all intents and purposes, Proto Indo-European is an imaginary language. Sort of. It's not like Klingon or anything. It is reasonable to believe it once existed. But nobody every wrote it down so we don't know exactly what "it" really was. Instead, what we know is that there are hundreds of languages that share similarities in syntax and vocabulary, suggesting that they all evolved from a common ancestor.
Of course, that very quickly leads to attempts to reconstruct what said ancestral language might have sounded like. In the track above, you can listen to University of Kentucky linguist Andrew Byrd recite a fable in reconstructed Proto Indo-European. Archaeology magazine helpfully provides a translation:
Read the rest
Max Barry's new technothriller Lexicon is a gripping conspiracy novel about a cabal of "poets" who have mastered the deep language of the human brain and can use it to boss the rest of us around. It's a pitch-perfect thriller, a jetpack of a plot that rocketed me from page one to page 400 in a single afternoon, and it kept me guessing right up to the end. Imagine Dan Brown written by someone a lot smarter and better at characterization and at hand-waving the places where the science shades into science fiction, and you've got something like Lexicon.
In particular, Lexicon captures a lot of the stuff that makes the myth of Neurolinguistic Programming so compelling -- the idea that smart people can figure out how to make others march in lockstep just by tricking their subconsciouses into thinking that that's what they wanted to do all along. And Barry carries through the power-fantasy to its inescapable end: a secretive, paranoid, power-maddened cabal that is its own worst enemy.
Full of surprises and grace notes, this is the kind of delightful thriller that's anything but a guilty pleasure, and just what you'd expect from the author of such great books as Jennifer Government and Machine Man.
Boston coder Darius Kazemi’s interest in chance led him to create a bot that buys stuff on Amazon: a human decision made ineluctably alien by the randomness of a computer’s whim.Read the rest
Languages come and go and blend. It's likely been that way forever and the process only accelerates under the influence of mega-languages (like English) that represent a sort of global means of communication. But, increasingly, people who are at risk of losing their native language entirely are fighting back—trying to encourage more people to be bilingual and save the native language from extinction.
At Discover Magazine, Veronique Greenwood has a really interesting story about a mathematician who is helping to preserve Scottish Gaelic. How? The researcher, Anne Kandler, has put together some equations that can help native language supporters target their programs and plan their goals.
Some of the numbers are obvious—you must know how many people in the population you’re working with speak just Gaelic, how many speak just English, and how many are bilingual, as well as the rate of loss of Gaelic speakers. But also in the model are numbers that stand for the prestige of each language—the cultural value people place on speaking it—and numbers that describe a language’s economic value.
Put them all together into a system of equations that describe the growth of the three different groups—English speakers, Gaelic speakers, and bilinguals—and you can calculate what inputs are required for a stable bilingual population to emerge. In 2010, Kandler found that using the most current numbers, a total of 860 English speakers will have to learn Gaelic each year for the number of speakers to stay the same. To her, this sounded like a lot, but the national Gaelic Development Agency was pleased: it’s about the number of bilingual speakers they were already aiming to produce through classes and programs.
You had me at hello: How phrasing affects memorability, a clever study of "memorable phrases" from movies and advertisements from Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg, Lillian Lee at Cornell attempts to uncover why certain phrases become part of our collective history.
The results are interesting. The phrases themselves turn out to be significantly distinctive, meaning they're made up of combinations of words that are unlikely to appear in the corpus. By contrast, memorable phrases tend to use very ordinary grammatical structures that are highly likely to turn up in the corpus.
They also found that memorable phrases tend to use pronouns (other than you), the indefinite article a rather than the definite article the, and verbs in the past rather than present tense. These are all features that tend to make phrases general rather than specific.
So memorable phrases contain generic pearls of wisdom expressed with unusual combinations of words in ordinary sentences.
In "Linguistic properties of multi-word passphrases" (PDF, generates an SSL error) Cambridge's Joseph Bonneau and Ekaterina Shutova demonstrate that multi-word passphrases are more secure (have more entropy) than average user passwords composed of "random" characters, but that neither is very secure. In a blog post, Joseph Bonneau sums up the paper and the research that went into it.
Some clear trends emerged—people strongly prefer phrases which are either a single modified noun (“operation room”) or a single modified verb (“send immediately”). These phrases are perhaps easier to remember than phrases which include a verb and a noun and are therefore closer to a complete sentence. Within these categories, users don’t stray too far from choosing two-word phrases the way they’re actually produced in natural language. That is, phrases like “young man” which come up often in speech are proportionately more likely to be chosen than rare phrases like “young table.”
This led us to ask, if in the worst case users chose multi-word passphrases with a distribution identical to English speech, how secure would this be? Using the large Google n-gram corpus we can answer this question for phrases of up to 5 words. The results are discouraging: by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users. The returns appear to rapidly diminish as more words are required. This has potentially serious implications for applications like PGP private keys, which are often encrypted using a passphrase. Users are clearly more random in “passphrase English” than in actual English, but unless it’s dramatically more random the underlying natural language simply isn’t random enough. Exploring this gap is an interesting avenue for future collaboration between computer security researchers and linguists. For now we can only be comfortable that randomly-generated passphrases (using tools like Diceware) will resist offline brute force.
Omniglot is an intimidatingly complete site devoted to cataloging every writing system that ever existed. As JoshP says, "If you ever need to transliterate Punic... this is the place."
Yesterday's keynote at the 28th Chaos Computer Congress (28C3) by Meredith Patterson on "The Science of Insecurity" was a tour-de-force explanation of the formal linguistics and computer science that explain why software becomes insecure, and an explanation of how security can be dramatically increased. What's more, Patterson's slides were outstanding Rageface-meets-Occupy memeshopping. Both the video and the slides are online already.
Hard-to-parse protocols require complex parsers. Complex, buggy parsers become weird machines for exploits to run on. Help stop weird machines today: Make your protocol context-free or regular!
Protocols and file formats that are Turing-complete input languages are the worst offenders, because for them, recognizing valid or expected inputs is UNDECIDABLE: no amount of programming or testing will get it right.
A Turing-complete input language destroys security for generations of users. Avoid Turing-complete input languages!
Patterson's co-authors on the paper were her late husband, Len Sassaman (eulogized here) and Sergey Bratus.
On Submitterator, Musicman pointed me towards this great presentation on LOLspeak as a form of language play, and why people engage in that play. According to Lauren Gawne, who gave this speech last week at the Australian Linguistics Society conference, the choice to use LOLspeak has a lot to do with establishing identity—the playful identity of "cat", and the serious identity of "knowledgeable Internet user".
Includes an explanation of why LOLspeak is language play and not some language mashup "kitty pidgin".
You can read more about this on Lauren Gawne's blog Superlinguo.
The video, by the way, is 20 minutes long. It's also got a little bit of weird, warbly feedback in the audio, but that doesn't get in the way of hearing what Gawne is saying.