Bayesian spam rumination: when word-frequency-histograms attack!

Ed Felten has posted an intriguing rumination on the possible failure modes of Bayesian spam-filtering — filtering that uses word-frequency statistics to classify email as spam or ham. As Ed points out, Bayesian filters are trained by the spammers, who, by choosing the vocabulary of their messages carefully, can make messages containing certain words or phrases undeliverable on the Internet.

Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users' Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.

This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user's filter.

Link