By Cory Doctorow at 10:50 am Friday, Apr 6
• Comments • Share
You had me at hello: How phrasing affects memorability, a clever study of "memorable phrases" from movies and advertisements from Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg, Lillian Lee at Cornell attempts to uncover why certain phrases become part of our collective history.
The results are interesting. The phrases themselves turn out to be significantly distinctive, meaning they're made up of combinations of words that are unlikely to appear in the corpus. By contrast, memorable phrases tend to use very ordinary grammatical structures that are highly likely to turn up in the corpus.
They also found that memorable phrases tend to use pronouns (other than you), the indefinite article a rather than the definite article the, and verbs in the past rather than present tense. These are all features that tend to make phrases general rather than specific.
So memorable phrases contain generic pearls of wisdom expressed with unusual combinations of words in ordinary sentences.
The Secret Science of Memorable Quotes
By Cory Doctorow at 3:24 pm Tuesday, Mar 13
• Comments • Share
In "Linguistic properties of multi-word passphrases" (PDF, generates an SSL error) Cambridge's Joseph Bonneau and Ekaterina Shutova demonstrate that multi-word passphrases are more secure (have more entropy) than average user passwords composed of "random" characters, but that neither is very secure. In a blog post, Joseph Bonneau sums up the paper and the research that went into it.
Some clear trends emerged—people strongly prefer phrases which are either a single modified noun (“operation room”) or a single modified verb (“send immediately”). These phrases are perhaps easier to remember than phrases which include a verb and a noun and are therefore closer to a complete sentence. Within these categories, users don’t stray too far from choosing two-word phrases the way they’re actually produced in natural language. That is, phrases like “young man” which come up often in speech are proportionately more likely to be chosen than rare phrases like “young table.”
This led us to ask, if in the worst case users chose multi-word passphrases with a distribution identical to English speech, how secure would this be? Using the large Google n-gram corpus we can answer this question for phrases of up to 5 words. The results are discouraging: by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users. The returns appear to rapidly diminish as more words are required. This has potentially serious implications for applications like PGP private keys, which are often encrypted using a passphrase. Users are clearly more random in “passphrase English” than in actual English, but unless it’s dramatically more random the underlying natural language simply isn’t random enough. Exploring this gap is an interesting avenue for future collaboration between computer security researchers and linguists. For now we can only be comfortable that randomly-generated passphrases (using tools like Diceware) will resist offline brute force.
Some evidence on multi-word passphrases
(via Schneier)
By Cory Doctorow at 6:00 am Thursday, Jan 5
• Comments • Share

Omniglot is an intimidatingly complete site devoted to cataloging every writing system that ever existed. As JoshP says, "If you ever need to transliterate Punic... this is the place."
Omniglot - the guide to languages, alphabets and other writing systems
By Cory Doctorow at 11:35 pm Wednesday, Dec 28
• Comments • Share

Yesterday's keynote at the 28th Chaos Computer Congress (28C3) by Meredith Patterson on "The Science of Insecurity" was a tour-de-force explanation of the formal linguistics and computer science that explain why software becomes insecure, and an explanation of how security can be dramatically increased. What's more, Patterson's slides were outstanding Rageface-meets-Occupy memeshopping. Both the video and the slides are online already.
Hard-to-parse protocols require complex parsers. Complex, buggy parsers become weird machines for exploits to run on. Help stop weird machines today: Make your protocol context-free or regular!
Protocols and file formats that are Turing-complete input languages are the worst offenders, because for them, recognizing valid or expected inputs is UNDECIDABLE: no amount of programming or testing will get it right.
A Turing-complete input language destroys security for generations of users. Avoid Turing-complete input languages!
Patterson's co-authors on the paper were her late husband, Len Sassaman (eulogized here) and Sergey Bratus.
LANGSEC explained in a few slogans
By Maggie Koerth-Baker at 9:34 am Friday, Dec 9
• Comments • Share
On Submitterator, Musicman pointed me towards this great presentation on LOLspeak as a form of language play, and why people engage in that play. According to Lauren Gawne, who gave this speech last week at the Australian Linguistics Society conference, the choice to use LOLspeak has a lot to do with establishing identity—the playful identity of "cat", and the serious identity of "knowledgeable Internet user".
Includes an explanation of why LOLspeak is language play and not some language mashup "kitty pidgin".
You can read more about this on Lauren Gawne's blog Superlinguo.
The video, by the way, is 20 minutes long. It's also got a little bit of weird, warbly feedback in the audio, but that doesn't get in the way of hearing what Gawne is saying.
By Maggie Koerth-Baker at 8:55 am Monday, Aug 15
• Comments • Share
Last year, I stumbled across some of the cool history of American Sign Language, documenting how it evolved out of both formal and informal languages—systems Deaf children used to communicate at home, and the systems they were taught as Deaf schools drew diverse groups from a wide geographical range. For American Sign Language, this process happened in the 19th century. In other parts of the world, it's still ongoing. For instance, in Nicaragua, Deaf people who are in school now are learning a much more formalized language, with a much bigger vocabulary, than those who went to school in the 1980s.
Those international differences are fascinating to me, so I'm really pleased to find this post on the Sinosplice blog, discussing the Chinese system of finger spelling. The blogger there is a linguist, so there's a lot of neat perspective in the linked post and others on the linguistic mechanics of finger spelling and sign language in China.
Finger spelling is very different from a sign language. In a sign language, you'd have one hand movement or hand position that stands for the concept "bird." In finger spelling, you'd have several different movements/positions for each letter or sound of the word "bird." You probably picked up some American finger spelling from Sesame Street, it's likely to at least look somewhat familiar. But the really cool thing about this post, is that it contrasts that system with the finger spelling alphabets used in Russia, Japan, and several that have been used historically in China. That's the US system above. Below, the modern Chinese system that corresponds to the pinyin, a way of transcribing printed Chinese words into Roman letters.
Via Kerim Friedman
By Cory Doctorow at 5:12 pm Monday, Aug 8
• Comments • Share
Writing for the OED, Stefan Dollinger (director of the Canadian English Lab, University of British Columbia at Vancouver) provides indispensable notes on talking Canadian:
We can find the linguistic expression of the Canadian east-west connection at all linguistic levels. Vowels, for instance, love to change but when they change in Canada they have been shown to rarely – for some changes never—to cross the Canada-US border. For example, the ‘Canadian shift’, first detected in the mid 1990s, affects the ‘short front vowels’, i.e. the three vowels exemplified in black, pen or tin. In Canada these vowels move in the opposite direction to the well-established ‘Northern Cities Shift’ in parts of the United States. So in Canada, the vowel in black, for instance, is pronounced farther back in the mouth. Canadian dialects are actually diverging from the American dialects that have experienced the shift, and this despite the high levels of interaction between the two countries.
Other features include ‘Canadian raising’, the most-widely known Canadian pronunciation feature. Canadian raising affects the diphthongs in words such as wife, price or life and house, about or shout. Canadian pronunciations, though far from universal, are often perceived as weef instead of wife and a boot instead of about by outsiders. There are also other, less well-known Canadian differences, such as the Canadian integration pattern of foreign sounds represented by<a>. In words like pasta, lava, plaza, and drama the foreign <a> sound acquires the vowel in father in American English and British English, but the vowel of cat in Canadian English.
Canadian English
(Image: Canada, a Creative Commons Attribution (2.0) image from alexindigo's photostream)