Data mining the intellectual history of the human race with Google Book Search

Harvard's Jean-Baptiste Michel, Erez Lieberman Aiden and colleagues have been analyzing the huge corpus of literature that Google digitized in its Book Search program, and they're uncovering absolutely fascinating information about our cultural lives, the evolution of language, the secret history of the world, censorship and even public health. It's all written up in a (regwalled) paper in Science, "Quantitative Analysis of Culture Using Millions of Digitized Books":


When the team looked at the frequency of individual years, they found a consistent pattern. In their own words: "'1951' was rarely discussed until the years immediately preceding 1951. Its frequency soared in 1951, remained high for three years, and then underwent a rapid decay, dropping by half over the next fifteen years." But the shape of these graphs is changing. The peak gets higher with every year and we are forgetting our past with greater speed. The half-life of '1880' was 32 years, but that of '1973' was a mere 10 years.

The future, however, is becoming ever more easily ingrained. The team found that new technology permeates through our culture with growing speed. By scanning the corpus for 154 inventions created between 1800-1960, from microwave ovens to electroencephalographs, they found that more recent ones took far less time to become widely discussed.

The cultural genome: Google Books reveals traces of fame, censorship and changing languages

(via Beyond the Beyond)