Chaining AMZN's Statistically Improbable Phrases

O'Reilly's Gnat Torkington is noodling around with Amazon's "Statistically Improbable Phrases," (SIPs) a collection of snippets of words that don't usually appear next to one another that Amazon publishes for many of the books in its catalog. By "chaining" SIPs — that is, joining up books with similar linguistic peccadillos, Gnat's doing some pretty sophitisticated subject-clustering:

Discover the hidden transexual tie-in to Da Vinci Code. Take "Corporal Mortification" to Their Kingdom Come: Inside the Secret World of Opus Dei, take "million pesatas" to The Blind Man of Seville, take "sight lesson" to Genderqueer: Voices from Beyond the Sexual Binary.

Learn how Excel 2003 is connected to magical realism. From One Hundred Years of Solitude take "insomnia plague" to Breast Cancer, There and Back: A Woman-to-Woman Guide, take "toxic friends" to 365 Reasons to Stop Dieting, take "tie way" to Excel 2003 Formulas.

Link