Uh, um and nationality and speech recognition

"Signposting" — involuntarily uttering little interstitial syllables between words, like "um" and "uh" — varies widely from country to country, and makes life difficult for speech-recognition software.

The French say something that sounds like euh, and Hebrew speakers say ehhh. Serbs and Croats say ovay, and the Turks say mmmmm. The Japanese say eto (eh-to) and ano (ah-no), the Spanish este, and Mandarin speakers neige (NEH-guh) and jiege (JEH-guh). In Dutch and German you can say uh, um, mmm. In Swedish it's eh, ah, aah, m, mm, hmm, ooh, a and oh; in Norwegian, e, eh, m and hm.

These interruptions, it turns out, plague machines more than people — speech-recognition systems in particular — so researchers have increasingly been turning their attention to uh and um (among other so-called disfluencies).

Link

(via Foe Romeo)