« Helping MT and Images Get Along Nicely | Main | Close Old Comments with BlogJanitor »

Concerning Spam

Updated February 9, 2006. Originally posted in 2004 and updated several times since.

Spammers have discovered bloggers and sooner or later if you allow comments or trackback pings on your weblog you will get spammed.

Types of Blog Spam

Blog spam appears in many flavors:

  1. Basic comment spam. The spammer leaves a short uneventful message in a comment field in one of your entries. The spam comes from the URL placed in the comments URL field. These URLs link back to every conceivable scam.
  2. Comment spam flooding. The spammer uses an automated computer bot to flood your blog with comment spam messages, up to hundreds in an hour. The spammer doesn't necessarily leave a URL, but can leave garbage messages, almost like a graffiti artist. The comment spam can put a severe load on the server hosting your blog software to the point that it crashes.
  3. Trackback Spam. Spammers have discovered how to take advantage of Trackback. TrackBack spam is very similar to comment spam. The spammer sends TrackBack pings to your site that direct viewers to a totally unrelated URL.
  4. Referral spam. The spammer links to your site from their site, and then pings your site through their link, thus creating a reference and link to their site on the statistics referral log of your website. When you are reviewing your stats and see the reference to an odd site (ex. Paris Hilton), clicking on the link takes you to their site. Many people list "referrals" on their site publicly, so by spamming referral logs, not only does the spammer get a link on your referral log (which is picked up by Google) but may even get a link on your main page.

How can you fight spam on your blog?

MT 3.3 offers a built-in spam protection plugin called SpamLookup. In addition to this plugin, there are several other options you can implement to help stem the tide of spam. Note that the spammers are constantly improving their methods to game the system, requiring constant vigilance on the part of the MT community to keep coming up with new ways to block them.

  • SpamLookup. SpamLookup is a Movable Type plugin, developed by Brad Choate, that uses several techniques to identify spam, and then uses user-supplied choices to either moderate or block it. SpamLookup is an integrated part of MT 3.3, so if you have installed the latest version of MT, there is nothing more you need to install. SpamLookup utilizes several blacklist services to check incoming comments and trackbacks against known spammers. It allows you to either "junk" or moderate comments and trackbacks based on different settings for links and keywords. You can even "white list" domains or IP addresses. To adjust the settings on SpamLookup, simply open up your Plugins menu from the System Overview of your Movable Type editing window. Scroll to the bottom and select "Show Settings" from any of the SpamLookup modules. See Neil Turner's suggestions on Making the Most of SpamLookup and David Philip's SpamLookup's Keyword Filter Explained for more information on how to best use this plugin.
  • Akismet Akismet is a distributed spam filtering service developed by the Wordpress community. According to the Akismet FAQ, the way it works is "When a new comment, trackback, or pingback comes to your blog it is submitted to the Akismet web service which runs hundreds of tests on the comment and returns a thumbs up or thumbs down." MT developer Tim Appnel has created an MT plugin for Akismet (MT-Akismet) which can be downloaded from the Akismet website. Many have found Akismet to be more effective at catching spam than SpamLookup.
  • Comment Challenge Jay Allen's Comment Challenge plugin requires a commenter to type a keyword into a separate field from the comment field in order for the comment cgi script to run. This plugin effectively halts automatic computer generated spam comments.
  • Use a "Captcha". A captcha is a security code that a commenter must enter in order for her comment to load. The benefit is that it screens out automated comment spam bots. The downside is that it keeps visually disabled people from easily contributing a comment. Arvind has released an SCode plugin to work with MT 3.2 - MT-SCode 1.0.
  • Require approval before a comment posts. One way to ensure that your readers never have to see a spam message is that you personally approve comments before they are posted. MT3 has the comment moderation features built-in. (See Settings > Feedback > check "Immediate publish comments from No one".)
  • Force "preview" before allowing comment submissions. Forcing site visitors to preview their comments before submitting them will not only give you more error-free comments, but will put yet another hurdle up against automatic comment spam bots. Just remove this line of code:
    <input type="submit" accesskey="s" name="post" id="comment-post" value="Post" />
    from your Individual Entry Archive Template.
  • Close old comments. One way to cut down on blog spam is to reduce the opportunities by closing the ability to comment on blog posts older than X number of days. Mark Carey's BlogJanitor plugin lets you do just that, and all automatically.

Fighting Comment Spam Flood Attacks

One way that spammers can cause trouble is by repeatedly pinging your server, hundreds of times an hour, trying to leave their comment spam. This can cause server CPU overloads and crashes and can even have your web host shut down your account.

Used in conjunction with a spam filter such as Spam Lookup or Akismet, the MTAutoBan plugin can help automatically stop spam floods by banning or redirecting comments from IP addresses that have been identified as generating junk comments.

Fighting TrackBack Spam

The primary measures to fight TrackBack spam are similar to comment spam - SpamLookup, Akismet and/or Trackback moderation. To moderate Trackbacks, open the settings for your weblog. Select the Feedback settings. Under Trackbacks, select, "Hold all Trackbacks for approval before they are published."

Fighting Referral Spam

Fight referral spam by ammending .htaccess file. Referral spam is annoying, but it doesn't affect the public display of your site unless you are publishing your referral log. If it bothers you enough that spam companies are benefiting by creating backlinks to their sites on your referral logs, you can ammend your .htaccess file (see What is .htaccess?) with the following lines of code:


SetEnvIfNoCase Referrer ".*(casino|gambling|poker|porn|sex|nude|xxx|hilton|pics|video).*" BadReferrer
order deny,allow
deny from env=BadReferrer

See this Killing Referrer Spam article for more info on using the htaccess method to fight referral spam.

Links:
Six Apart Guide on Combatting Comment Spam
SpamLookup's Keyword Filter Explained by David Phillips.
Making the Most of SpamLookup
New Comment Spam Technique - Adam Kalsey notes that spammers are creating comment spam with links to legitimate sites that have been spammed to get the page rank up for those links.
ARIN WHOIS Database Search - Look up the ISP of the IP address of the person spamming you and report the spammer behavior.
Bloggers Declare War on Comment Spam, but Can They Win? - article from the USC Annenberg Online Journalism Review.
Mod Rewrite method to divert spam bots to a 403 error
Killing Referrer Spam

Comments (5)

billg:

I'm glad you didn't include the old notion of changing the names of the comment and trackback scripts. That doesn't work. Spammers just scan your code looking for the name that is actually used. In fact, my host seems to discourage renaming, presumably because it weakens some of their own internal spam measures, about which they keep pretty quiet.

Doug:

For discussion purposes, I tend to categorise all of the above, plus other approaches, into 3 groups:
- Prevention. Using CAPCHA and other methods, one can make it more difficult for the bots to post SPAM. My experiences is that virtually all SPAM comments are from bots, obvious human SPAM is relatively small. However, there are two drawbacks. One is that the bots are getting better at working around these systems. The other is that some of then (e.g. CAPCHAs, login requirements) have now become so complicated that real visitors can be detered by them.

- Removal. Manual removal works but the time overhead can become unbearable. I almost closed down my blogs and forums due to this, until I found better methods. Some automated systems (such as Akismit) are currently pretty good at this.

- Incentive. Remove tracebacks, referals and such. This is a bit extreme, but if there is no incentive to SPAM, there is a lot less of it.

I personally find the best is to combine more than one method. My criteria for success are:
- Minimal effect on valid comments (this is why I dislike sites that require login)
- Minimal effort to keep the system SPAM free (which is why the manual removal method on its own is seldom adequate)

There's also 'form' spam ..some of these bots will just fill in every form field they can find on a website.. even my MT weblog search field was getting spammed.. so Google can take care of that from now on. Trackbacks I had to remove a long time ago. The latest involved a week or so of calls to non-existent scripts, to the tune of a constant three to eight calls per second, 24/7. I DESPISE SPAMMERS! Come the day of the great revolution, they will be the first up against the wall, the bastards.

Thanks for the tips. Currently I have SpamLookup and Akismet for my blog. What a gret combo!

  • For my weblog, I close comments and trackbacks for old entries.
  • Renaming comment scripts, and trackback scripts
  • For previewing before submitting (yes, it does the trick)
Skye:

I have found the Comment Challenge plugin so helpful that I have actually started letting comments be published without me approving them first. It's amazing.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)