SIMSON SAYS: An End to Spam With SpamAssassin

Simson L. Garfinkel

(Mark's note: I've known Simson for a good many years, and have always admired his fine writing. When I was an editor at Wired, it always excited me to get one of his email pitches. He's a very interesting fellow, and the author of several books. He wrote a column for the Boston Globe called "Simson Says" from 1995-2000, now he is self-syndicating it. Boing Boing will run his columns for as long as Simson says.)

Earlier this year my email inbox was overflowing with spam --- junk email advertising everything from bolts made in China to pornographic websites. Although it seems hard to believe now, I was actually getting more than 70 pieces of spam every day. There was so much spam, in fact, that I had given up reading messages sent to an email address that I had used since 1995. And because a few business associates didn't know that I had stopped using that old email address, the decision ended up costing me thousands of dollars in missed opportunities.

Spam is not democratic: some people get hardly any, while others get tons. If you post messages to popular mailing lists or put your email address on web pages, you dramatically increase the chances that you'll get a lot of spam. You can also get a lot of spam if you simply have an email address that's predictable --- an address that a spammer might reasonably guess, like I get a lot of spam because my email address has been widely published on web pages and, even worse, in online directories.

All of that spam now in my past: today my inbox is virtually spam free. Even better, I've been able to reclaim that old email account. Of course, the spammers haven't stopped sending me their missivies. But now that mail is being filtered out by an ingenious piece of software called SpamAssassin.

In the past 45 days, SpamAssassin has removed 3357 messages from my inbox and put them in a separate box called "Spam," where I'm free to either ignore them or review them at my leisure. This is a service for which I would have happily paid. As it turns out, there's no need: unlike other anti-spam systems out there today, SpamAssassin is free.

The underlying SpamAssassin technology was invented in April 2001 by Justin Mason, an Irish computer programmer living in Australia. Mason created a rule-based system that scores email messages according to a variety of rules. For example, an invalid time zone in the header gives an email message 2 points; a subject that is all capital letters gives the message another 2 points; and a link at the bottom of the message with the word "remove" in it gives the message 4.1 points. Any message with more than 5 points total is considered spam.

Mason's spam-detection engine was incredibly accurate. Unfortunately, it was also quite slow, sometimes taking more than 10 seconds on each message that it attempted to identify. Fortunately Mason published his program on the Internet for anyone to use. Six months later a programmer in California named Craig Hughes came up with a trick for making SpamAssassin run dramatically faster.

Since then, SpamAssassin has steadily grown in popularity. According to Hughes, more than 11,000 copies of the program were downloaded this past April. "People have downloaded it from addresses at IBM, RedHat, TicketMaster, Yahoo, FedEx, Amazon, Salon, Sun, Informix, Ikea, Nortel, Cisco, AIG, Dell, Apple, and Network Solutions, among thousands of others," says Hughes, who is now one of the volunteers coordinating the project.

Today SpamAssassin has more than 300 rules and a dictionary of 10,000 phrases it uses for spam detection. SpamAssassin also hooks in to several anti-spam networks, including the Mail Abuse Prevention System, better known as MAPS, and Vipul's Razor.

MAPS is a simple blacklist of companies or Internet Service Providers that have been caught sending spam in the past. The service, which carries a subscription fee, has been the target of criticism and the occasional lawsuit in the past. That's because an organizations have been added to the MAPS blacklist, they suddenly find that there are thousands of ISPs who will no longer accept their email.

Vipul's Razor applies an approach called "collaborative filtering" to the task of fighting spam. Developed by Vipul Ved Prakash, another California-based programmer, Razor relies on a technique for fingerprinting email messages and a network of volunteers around the world who report spam the instant they receive it.

Reporting spam is easier than you might imagine: many ISPs lose between 10% and 30% of their customers every year. (One of the leading reasons for this churn, apparently, is that the customers are getting too much spam!) After an account is turned off for six or twelve months, some ISPs turns the accounts back on and point them at the Razor reporting network. These email addresses become, in effect, spam traps. Any email message that gets sent to them is automatically fingerprinted and reported as spam.

"Spam is email broadcast, so everyone on the recipient list gets the same spam message," says Prakash. "If the first receiver shares the information identifying the contents of spam with the rest of the intended recipients, they could refuse to accept the message before it hits their mailbox. That's the basic idea behind Vipul's Razor. Given enough identifiers, every spam attack is surmountable."

SpamAssassin doesn't use either MAPS or the Razor network as all-or-nothing tests; instead, the scores from these systems are merely added to SpamAssassin's other rules. This limits the damage that occurs when an entire ISP gets blacklisted by MAPS for one or two bad customers --- or when a mail message for a popular mailing list gets erroneously sent to the Razor network.

Occasionally SpamAssassin makes mistakes. Last week, for example, I missed some messages from a mailing list that I'm on because SpamAssassin mis-identified the message and put it into my "spam" box. Once I realized that problem, all I had to do was to add the sender of those mail messages to my "whitelist." Now, when SpamAssassin sees those messages, it will pass them through without delay.

Despite the minor mishap, I've become a SpamAssassin evangelist. One recent convert: University of Pennsylvania professor David Farber, who runs an influential mailing list and spent a year being the Chief Technologist at the Federal Communications Commission. As you can imagine, Farber gets a ton of spam --- or at least he did, before he turned on SpamAssassin. Today he hardly gets any. "The spam stuff works like a charm," he told me in an email message.

Unfortunately, there is one catch with SpamAssassin: it only runs on UNIX-based email systems. If you are a typical home computer user who downloads your email from an Internet Service Provider, you can't run SpamAssassin --- you need to have your ISP run it for you. Many ISPs have in fact started to do so. If your ISP has not, drop them a note. Meanwhile, Hughes and a few of his compatriots are working on a commercial version of SpamAssassin that will run on Windows and cost under $30.

"It's only recently that end-users have become concerned with spam levels --- system administrators have been concerned for much longer," says Hughes, noting Hotmail and other ISPs are now receiving between 4 and 20 pieces of spam mail for every genuine email message.


Simson L. Garfinkel is a journalist, computer columnist, and the author of 11 books. His book Web Security, Privacy and Commerce was published last November by O'Reilly & Associates. Garfinkel is the part owner of Vineyard.NET, a small Internet Service Provider that serves the island of Martha's Vineyard.

