« Creating a 3-Column Layout in MT3.2 | Main | Understanding the Category Listing Code »

Making the Most of SpamLookup

This tutorial is written by LMT contributor Neil Turner and is cross-posted on Neil's World. mtbadge-small.gif

Since upgrading to Movable Type 3.2 I’ve dumped Jay Allen’s MT-Blacklist and instead made SpamLookup handle comment/trackback spam on its own. The plugin is included by default on MT 3.2, and while it can do a good job as it is, you might like to try some tune-ups to make it more effective.

Moderation and Junking

In Movable Type 2.x, comments just had one status - published. Any spam blocking system could only accept or deny comments and trackbacks. In MT 3.0x and 3.1x, comments gained an additional status - ‘moderated’. This was where comments could be held for human approval before being published, and tools like SpamLookup and MT-Blacklist could hold comments here if they thought they might be spam but couldn’t be sure.

With 3.2x, trackbacks can also be moderated, but a new third status has been added for both: junk. Now, rather than deleting spam outright, you’ll find plugins send it sent here instead. That way, if you have a false positive - a comment that is seen as being spam but isn’t - you can retrieve it.

The junk status also has a rating system, and plugins can adjust the rating for an individual comment or trackback. The rating is between 10 and -10 - comments with a negative score are junked, otherwise they are moderated or published. You’ll find that SpamLookup can reduce the rating of comments that it thinks are spam, but also add points if, say the comment has no links or has been posted with a URL that has already been accepted before.

1. How to find the SpamLookup configuration options

SpamLookup can be configured at two levels - blog and installation. If you just have the one blog, or want any settings to apply across all the blogs on your installation of Movable Type, configure SpamLookup at the installation level, using the Plugins item on the MT main menu or System Overview screen - it should be towards the bottom. If you only want settings to apply to one blog, you can configure SpamLookup using the Plugins tab of the Settings item on the weblog menu.

2. Lookups Settings

When the plugin was first launched as MT-DBSL, lookups was all the plugin did. Now, it’s just one weapon in its formidable anti-spam arsenal. There are three options here:

IP address lookups
These look up the source IP address of the comment or trackback and compare it with several centralised blacklist servers (you can add extra servers if you wish). If the IP address is found on the blacklist server, you have the option of forcing moderation of the comment, adjusting its junk status (the default action is to subtract 1 from its score) or do nothing. This can be quite effective but only if you trust the blacklisting systems.
Domain name lookups
This works in the same way, but looks up the domain names of the posted links. This is similar to how MT-Blacklist worked, except the blacklist is hosted elsewhere and not on your MT installation. Again, this is effective but only if you trust the blacklisting systems.
Advanced Trackback Lookups
This is quite badly explained, which is unfortunate as it can be very effective. This compares the IP of the source URL of the the trackback with the IP it was sent from. Normally the blog software sending the ping is on the same server as the blog itself, so they should match. A lot of spam is sent from zombie machines, not from the web site itself, so this will catch this sort of spam. As I said, much of the spam I get is caught by this rule, but some spammers have become wise and started sending trackback pings from the same IP address as the web site they are trying to promote so as to get around it. Also, I often get pings from a reader who blogs with Blogger and sends his pings from a third-party service not hosted on his site and these do get junked sometimes because of this option.
3. Link Settings

This looks at the source URL (trackbacks) or comment author URL (comments), along with any URLs posted in the comment itself.

  • The first option adds to the junk score if a comment has no links, since the general aim of blog spam is to link to a dodgy site to improve its ranking in search engines - a comment with no links is unlikely to be spam.
  • The second option will forcibly moderate any comment or trackback that has more than a certain number of links. Spam comments tend to have lots of links, but some commenters may post legitimates lists of links to other handy resources so don’t set this too low.
  • The third option is like the second but will subtract from the junk score. Set this higher than the previous option if you have it enabled - mine is at 4.
  • The ‘Link memory’ option adds to the score if you have previously approved a comment containing the same URLs. This means that regular commenters are less likely to fall foul of any other rules. Keep this enabled.
  • The ‘Email memory’ does the same, except with email addresses. This may be undermined if you approve a lot of comments with false email addresses, though, and obviously doesn’t really work with trackbacks.
4. Keyword Filter Settings

These options act upon keywords in comments, and again replicate some functionality of MT-Blacklist. The first box contains words that should force a comment to be moderated, a new word on each line. In here you would want to put in words that may be indicative of spam but could also be used often in a legitimate sense. Mine includes words like ‘video’, ‘sexy’, ‘bankruptcy’ and a variety of swear words.

The second box contains words that would force a comment to be junked. By default, the junk status gets subtracted by 1 every time one of these words is found, so a spam comment that mentioned 3 different drug names all blocked by this list would get 3 taken off its junk score. However, if there are words that you think will never be used in a legitimate comment, you can put a number after it and any comment containing that word will have its junk status subtracted by that number. So putting in “viagra 4” would subtract 4 from the junk score of any comment containing the word ‘viagra’.

In both cases, you can also use Perl regular expressions - one I use is “direct(v|tv)” which matches “directv” and “directtv”.

This is an incredibly powerful feature, however by default the plugin hardly has any keywords in it. A good starting place is, ironically enough, The Wordpress Wiki which has a list that you can paste in. And it doesn’t necessarily have to be words - I get a lot of spam from a site called xxlfind.biz, so I added “xxlfind.biz 4” to my list to block out the spam from it.

SpamLookup does miss out a couple of handy features, namely blocking of duplicate comments and trackbacks, and better throttling, like in Real Comment Throttle, but after doing the above on my site, I get around 98% accuracy, with only a couple of trackbacks getting marked as junk and no false negatives (spam that gets through all the filters). Hopefully you will too.

Comments (1)

Deborah:

Hi,

I see that SpamLookup blocks IP addresses, based upon entries in blacklists.

Well, I'm wondering, does this list end up with 'good' IP addresses on it? Because, the spammers spoof the IP and aren't actually presenting their real IP. For example, when you look in the MT comments area and see an IP addresses listed there for the spammers comment, that IP address is not the real one. So, good IP addresses can be blocked and keep out good users.

Please explain this...thanks.

By the way, I know my IP is showing up in your system now with the first number of 24...however, this is not accurate, because I disguised it with a proxy. My real one begins with 216. Anyway...

Post a comment

(If you haven't left a comment here before, your comment may need to be approved before will appear on the entry. Thanks for waiting.)