In "Linguistic properties of multi-word passphrases" (PDF, generates an SSL error) Cambridge's Joseph Bonneau and Ekaterina Shutova demonstrate that multi-word passphrases are more secure (have more entropy) than average user passwords composed of "random" characters, but that neither is very secure. In a blog post, Joseph Bonneau sums up the paper and the research that went into it.
Some clear trends emerged—people strongly prefer phrases which are either a single modified noun (“operation room”) or a single modified verb (“send immediately”). These phrases are perhaps easier to remember than phrases which include a verb and a noun and are therefore closer to a complete sentence. Within these categories, users don’t stray too far from choosing two-word phrases the way they’re actually produced in natural language. That is, phrases like “young man” which come up often in speech are proportionately more likely to be chosen than rare phrases like “young table.”
This led us to ask, if in the worst case users chose multi-word passphrases with a distribution identical to English speech, how secure would this be? Using the large Google n-gram corpus we can answer this question for phrases of up to 5 words. The results are discouraging: by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users. The returns appear to rapidly diminish as more words are required. This has potentially serious implications for applications like PGP private keys, which are often encrypted using a passphrase. Users are clearly more random in “passphrase English” than in actual English, but unless it’s dramatically more random the underlying natural language simply isn’t random enough. Exploring this gap is an interesting avenue for future collaboration between computer security researchers and linguists. For now we can only be comfortable that randomly-generated passphrases (using tools like Diceware) will resist offline brute force.