Source code for MySpace pedophile-hunter bot

Wired News has released the source code for a program written by its editor Kevin Poulsen to catch pedophiles on MySpace by comparing MySpace profiles to registries of sex-offenders. Poulsen is a notorious reformed hacker who wrote the code to produce empirical data on the use of MySpace by sexual predators, though he acknowledges that the code only catches predators who use their real names, and that some sex offenders use the site for innocent purposes, to stay in touch with friends and family. The code is released under a BSD free software license:

Finding sex offenders on MySpace is a three-step process. First, you need the list of offenders. I put together the first script, scraperps.pl, in late April. From a list of ZIP codes, the program simply fills out the query form on the DOJ's registry, maxing out the query by running five ZIPs at a time. Then it stores the results — name, ZIP, city, county, state — in a database, within a table called `perps`.

My first run quickly got me temporarily blocked from the site. It turns out the DOJ server doesn't like you running a lot of queries back-to-back. When the ban was lifted (never let it be said that the Justice Department is unforgiving), I incorporated a 30-second pause between queries, which seemed to satisfy the server. That raised the run time to over 71 hours.

While that was under way, I went to work on screen-scraping MySpace. When you register for MySpace, you're prompted to provide your full name and your ZIP code. That information doesn't appear in your MySpace profile, which may help explain why so many offenders felt comfortable providing it. But MySpace's search engine lets you search by name, and restrict the results to within five miles of a particular ZIP code. That made it a natural match for the sex offender registry.

The MySpace scraper, myspacebot.pl, performs this search for every entry in `perps`, and loads the result into a table called `myspace`.

Link

See also: Wired News editor catches MySpace pedophile

Update: EPIC's Guilherme Roschke sez, "The code 'caught' lots of people, and it took human work to sort out who was a predator and who was not. "