Self-dissimilarity in word-frequency identifies hot news

A researcher at Cornell has developed a new technique for automatically identifying emerging trends online — by measuring average word-distribution-frequencies, he can spot trendy new words as they pop out of the blogosphere.

In a simple historical test of the technique, Kleinberg analysed all the annual State of the Union addresses given by US Presidents since 1790. He found that particular word "bursts" could indeed be linked to important events at the time the speeches were delivered.

In the years that immediately followed the American Revolution, for example, sudden bursts in the use of words such as "militia", "British" and "savages" are found.

From 1930 to 1937 a spike in the use of the word "depression" is seen. And from 1949 to 1959 "atomic" is the word with the greatest "burstiness". Later in the 20th century, words such as "Vietnam", "Soviet", "communist" and "Afghanistan" increase sharply in usage.

Link

Discuss