Technique for fighting submission-form spam

Ned Batchelder sums up a series of technique to keep spammers from attacking submission forms with automated bots (it won't work against humans, but even cheap humans are more expensive than bots). Some of these techniques look like they'll continue to work even if they're widely known, while others depend merely on exploiting vulnerabilities in spammer techniques that will be refined as soon as the exploits are widespread.

We get titanic amounts of spam to the anonymous Boing Boing submission form, and most of it gets stopped using variations on these techniques. One interesting thing about our submission spam is how indiscriminate it is: various scumbags have gone to some lengths to figure out how to send spam to a form whose output is emailed to four people, and who will never, ever accidentally post their submission to this blog — indeed, I just bulk-delete the stuff that makes it through the filter without even opening it — our spammers are indiscriminate enough to use spammy subject lines, which means, I suppose, that they think they're going to end up someone a human being won't see them but a search-engine might.

The comment form has four key components: timestamp, spinner, field names, and honeypots.

The timestamp is simply the number of seconds since some fixed point in time. For example, the PHP function time() follows the Unix convention of returning seconds since 1/1/1970.

The spinner is a hidden field used for a few things: it hashes together a number of values that prevent tampering and replays, and is used to obscure field names. The spinner is an MD5 hash of:

The timestamp,
The client's IP address,
The entry id of the blog entry being commented on, and
A secret.

The field names on the form are all randomized. They are hashes of the real field name, the spinner, and a secret. The spinner gets a fixed field name, but all other fields on the form, including the submission buttons, use hashed field names.

Honeypot fields are invisible fields on the form. Invisible is different than hidden. Hidden is a type of field that is not displayed for editing. Bots understand hidden fields, because hidden fields often carry identifying information that has to be returned intact. Invisible fields are ordinary editable fields that have been made invisible in the browser.


(via O'Reilly Radar)