Data Mining 101: Finding Subversives with Amazon Wishlists

Frequent Make contributor Tom Owad just published a mind-blowing how-to on his website explaining how to mine Amazon's wish list database to uncover "subversives."

Using a pair of 5-year-old computers, two home DSL connections, 42 hours of computer time, and 5 man hours, I now had documents describing the reading preferences of 260,000 U.S. citizens.

I downloaded all the files to an external 120 GB Firewire drive in UFS format. The raw data occupied little more than 5 GB. I initially wanted to move all the files into a single directory to facilitate searching, but as the directory contents exceeded 100,000 items, the speed became glacially slow, so I kept the data divided into chunks of 25,000 wishlists.

Next comes the fun part – what books are most dangerous? So many to choose from. Here's a sample of the list I made. Feel free to make up your own list if you decide to try some data mining. Send it to the FBI. I'm sure they'll appreciate your help in fighting terrorism.

Link

Reader comment: Anonymous says: "[This] method for grabbing the wishlists is
overly complicated. Amazon's web services API allows programmatic access to wishlist
information, making it even easier for a savvy programmer to quickly
compile a list of customers interested in certain books."