Technique for fighting submission-form spam

Ned Batchelder sums up a series of technique to keep spammers from attacking submission forms with automated bots (it won't work against humans, but even cheap humans are more expensive than bots). Some of these techniques look like they'll continue to work even if they're widely known, while others depend merely on exploiting vulnerabilities in spammer techniques that will be refined as soon as the exploits are widespread.

We get titanic amounts of spam to the anonymous Boing Boing submission form, and most of it gets stopped using variations on these techniques. One interesting thing about our submission spam is how indiscriminate it is: various scumbags have gone to some lengths to figure out how to send spam to a form whose output is emailed to four people, and who will never, ever accidentally post their submission to this blog -- indeed, I just bulk-delete the stuff that makes it through the filter without even opening it -- our spammers are indiscriminate enough to use spammy subject lines, which means, I suppose, that they think they're going to end up someone a human being won't see them but a search-engine might.

The comment form has four key components: timestamp, spinner, field names, and honeypots.

The timestamp is simply the number of seconds since some fixed point in time. For example, the PHP function time() follows the Unix convention of returning seconds since 1/1/1970.

The spinner is a hidden field used for a few things: it hashes together a number of values that prevent tampering and replays, and is used to obscure field names. The spinner is an MD5 hash of:

The timestamp,
The client's IP address,
The entry id of the blog entry being commented on, and
A secret.

The field names on the form are all randomized. They are hashes of the real field name, the spinner, and a secret. The spinner gets a fixed field name, but all other fields on the form, including the submission buttons, use hashed field names.

Honeypot fields are invisible fields on the form. Invisible is different than hidden. Hidden is a type of field that is not displayed for editing. Bots understand hidden fields, because hidden fields often carry identifying information that has to be returned intact. Invisible fields are ordinary editable fields that have been made invisible in the browser.

(via O'Reilly Radar)


  1. I recently got introduced to comment honeypots in a not-so-wonderful way – I discovered that my password-storing browser extension was filling data into the honeypot form on a popular site’s forum page. Worse, not only was the site not accepting the post, but it triggered a blacklisting for the entire site to visitors coming from my (rather large, but sitting behind a single NAT) office’s IP address. This generated some, um, interesting calls to IT tech support.

  2. The “spinner” and the randomized field names sound like valid Cross Site Request Forgery countermeasures, but won’t necessarily stop spammers. All that’s needed is to fetch the HTML page shortly before submitting and than feed back all non-empty hidden fields along with the regular ones. You could also go ahead and actually parse the page to find the correct fields, as regular form fields usually have some markup around them, are styled and are not hidden/invisible to the user. I suppose that this would also work for circumventing honeypot fields.

    In my experience, the easiest and usually most effective way to stop this sort of spam is to implement captchas.

    1. This doesn’t work against the major comment spambots operating today – they already parse the page and handle hidden and randomized fields.

      Submitterator had recaptcha and live moderators with battleaxes, and it still averaged 70% spam.

      1. I don’t have a captcha and I’m averaging 90% spam. Motherfucking spammers.

        For WordPress I find a plugin called “Safe Signup Form” helps easily ID anything that slips through other checks, yet is spam. It amends a line to the bottom of unmoderated comments (which are not auto-published on my site) like this one:

        [WORDPRESS HASHCASH] The comment’s server IP (___________) doesn’t match the comment’s URL host IP (___________) and so is spam.
        [WORDPRESS HASHCASH] The poster sent us ’0 which is not a hashcash value.

        In any case where the above line has been added to unmoderated comments I can almost guarantee they will be spam. I can’t remember it once spitting up a false negative, but thankfully I don’t face BB levels of spam!

  3. “In my experience, the easiest and usually most effective way to stop this sort of spam is to implement captchas.”

    In my experience from the admin side, I agree.

    In my experience from the user side, I do not agree.

    This is always going to be a tough trade-off, but in the case of BoingBoing, where the results are not posted automatically, sticking with hidden countermeasures means provides a straight forward user experience that is highly browser compatible.

  4. On my blog, I’m using a  “double whammy” consisting of WordPress’ Akismet filter and a very simple Turing test – a checkbox with the legend: “Please check this box if you’re human”.

    To do this, I had to modify the Perl code for the Blosxom writeback plugin:

    Without any spam protection, there would be thousands of spam comments every day. Akismet takes care of most of them, but sometimes it would let through a “wave”, and that could mean hundreds in a day. Curiously, no spam comment has yet come through with the checkbox: Those that make it through Akismet never check the box.

  5. We use Mollom on the sites the I build and maintain. Mollom does a fantastic job of discerning spam from ham One site that I work on used to have a person go through their comments daily. He ended up deleting about 95% of the comments, anywhere from 50 to 100 a day. Once we installed Mollom, it is rare to find any spam comments in the queue and if we do, we report them and we never get repeat offenders.

  6. We have a form at where fans can write to us if they have old Firesign artifacts they’d like to share. In order to send us their comments, readers must correctly enter the last word of a quote from one of Firesign’s classic albums. In the five years the form has been live, I believe only one spam has gotten through.

  7. Hand made replica watches! Two for $9.99.

    (LOL – please don’t nuke me. In this context, a joke of this nature is akin to an Onion bomb in Congress.)

  8. I don’t know if anyone has tried this yet…
    maybe human spammers (and trolls) could be hindered by not cancelling their accounts but instead hiding their comments from anyone else but themselves? In essence creating a private comment-space for them to rant by themselves without them realizing that no-one’s listening. Maybe have some chat-bot counter-trolling them every now and then.
    (Of course they could use 2 accounts to combat this but any extra effort might be enough to deter some of them)

  9. I’ve seen a new kind of attack on Disqus comments. The first comment is something innocuous (“Great post!”) This gets through filters and moderation. Then the spammer goes back and edits the comment so it contains the spam message, which triggers no filtering or admin alerts. 

    1. We had that happening pre-Disqus, as well.  Disqus will commonly let one or two spam comments through and then catch the next 25.

  10. One problem with this approach is that anyone who’s visually impaired and using the site with a screen reader isn’t going to be able to tell the difference between the “honeypot” fields and the regular ones

Comments are closed.