<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
   <title>Posts by Neil Turner on Learning Movable Type</title>
   <link rel="alternate" type="text/html" href="http://www.learningmovabletype.com/" />
   <link rel="self" type="application/atom+xml" href="http://www.learningmovabletype.com/contributors/nrturner/" />
   <id>tag:,2008-02-25:/5</id>
   <updated>2008-01-27T19:12:44Z</updated>
   <subtitle>Tutorials and helpful tips for the Movable Type web publishing system</subtitle>
   <generator uri="http://www.movabletype.org/">Movable Type Publishing Platform 4.01</generator>

<entry>
   <title>Making the Most of SpamLookup</title>
   <link rel="alternate" type="text/html" href="http://www.learningmovabletype.com/a/001421spamlookup_tips/" />
   <id>tag:www.elise.com,2005:/mt//10.1421</id>
   
   <published>2005-09-13T18:34:12Z</published>
   <updated>2008-01-27T19:12:44Z</updated>
   
   <summary>This tutorial is written by LMT contributor Neil Turner and is cross-posted on Neil&apos;s World. Since upgrading to Movable Type 3.2 I&#8217;ve dumped Jay Allen&#8217;s MT-Blacklist and instead made SpamLookup handle comment/trackback spam on its own. The plugin is included by default on MT 3.2, and while it can do...</summary>
   <author>
      <name>Neil Turner</name>
      <uri>http://www.neilturner.me.uk</uri>
   </author>
   
      <category term="Plugins" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="Security" scheme="http://www.sixapart.com/ns/types#category" />
   
      <category term="Spam" scheme="http://www.sixapart.com/ns/types#category" />
   
   <category term="commentspam" label="Comment Spam" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="spam" label="Spam" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="spamlookup" label="SpamLookup" scheme="http://www.sixapart.com/ns/types#tag" />
   <category term="trackbackspam" label="Trackback Spam" scheme="http://www.sixapart.com/ns/types#tag" />
   
   <content type="html" xml:lang="en" xml:base="http://www.learningmovabletype.com/">
      <![CDATA[<em>This tutorial is written by LMT contributor Neil Turner and is cross-posted on <a href="http://www.neilturner.me.uk/2005/Sep/10/making_the_most_of_spamlo.html">Neil's World</a>.</em>

<img alt="mtbadge-small.gif" src="http://www.learningmovabletype.com/images/mtbadge-small.gif" width="50" height="50" class="floatimgleft" /><p>Since upgrading to Movable Type 3.2 I&#8217;ve dumped Jay Allen&#8217;s <a href="http://www.jayallen.org/projects/mt-blacklist/">MT-Blacklist</a> and instead made <a href="http://www.spamlookup.com/">SpamLookup</a> handle comment/trackback spam on its own. The plugin is included by default on <acronym title="Movable Type">MT</acronym> 3.2, and while it can do a good job as it is, you might like to try some tune-ups to make it more effective.</p>

<h3>Moderation and Junking</h3>

<p>In Movable Type 2.x, comments just had one status - published. Any spam blocking system could only accept or deny comments and trackbacks. In <acronym title="Movable Type">MT</acronym> 3.0x and 3.1x, comments gained an additional status - &#8216;moderated&#8217;. This was where comments could be held for human approval before being published, and tools like SpamLookup and <acronym title="Movable Type">MT</acronym>-Blacklist could hold comments here if they thought they might be spam but couldn&#8217;t be sure.</p>

<p>With 3.2x, trackbacks can also be moderated, but a new third status has been added for both: junk. Now, rather than deleting spam outright, you&#8217;ll find plugins send it sent here instead. That way, if you have a false positive - a comment that is seen as being spam but isn&#8217;t - you can retrieve it.</p>

<p>The junk status also has a rating system, and plugins can adjust the rating for an individual comment or trackback. The rating is between 10 and -10 - comments with a negative score are junked, otherwise they are moderated or published. You&#8217;ll find that SpamLookup can reduce the rating of comments that it thinks are spam, but also add points if, say the comment has no links or has been posted with a <acronym title="Unified Resource Locator">URL</acronym> that has already been accepted before.</p>]]>
      <![CDATA[<strong>1. How to find the SpamLookup configuration options</strong>

<p>SpamLookup can be configured at two levels - blog and installation. If you just have the one blog, or want any settings to apply across all the blogs on your installation of Movable Type, configure SpamLookup at the installation level, using the Plugins item on the <acronym title="Movable Type">MT</acronym> main menu or System Overview screen - it should be towards the bottom. If you only want settings to apply to one blog, you can configure SpamLookup using the Plugins tab of the Settings item on the weblog menu.</p>

<strong>2. Lookups Settings</strong>

<p>When the plugin was first launched as <acronym title="Movable Type">MT</acronym>-DBSL, lookups was all the plugin did. Now, it&#8217;s just one weapon in its formidable anti-spam arsenal. There are three options here:</p>

<dl>
<dt><acronym title="Internet Protocol">IP</acronym> address lookups</dt><dd>These look up the source <acronym title="Internet Protocol">IP</acronym> address of the comment or trackback and compare it with several centralised blacklist servers (you can add extra servers if you wish). If the <acronym title="Internet Protocol">IP</acronym> address is found on the blacklist server, you have the option of forcing moderation of the comment, adjusting its junk status (the default action is to subtract 1 from its score) or do nothing. This can be quite effective but only if you trust the blacklisting systems.</dd>
<dt>Domain name lookups</dt><dd>This works in the same way, but looks up the domain names of the posted links. This is similar to how <acronym title="Movable Type">MT</acronym>-Blacklist worked, except the blacklist is hosted elsewhere and not on your <acronym title="Movable Type">MT</acronym> installation. Again, this is effective but only if you trust the blacklisting systems.</dd>
<dt>Advanced Trackback Lookups</dt><dd>This is quite badly explained, which is unfortunate as it can be very effective. This compares the <acronym title="Internet Protocol">IP</acronym> of the source <acronym title="Unified Resource Locator">URL</acronym> of the the trackback with the <acronym title="Internet Protocol">IP</acronym> it was sent from. Normally the blog software sending the ping is on the same server as the blog itself, so they should match. A lot of spam is sent from zombie machines, not from the web site itself, so this will catch this sort of spam. As I said, much of the spam I get is caught by this rule, but some spammers have become wise and started sending trackback pings from the same <acronym title="Internet Protocol">IP</acronym> address as the web site they are trying to promote so as to get around it. Also, I often get pings from a reader who blogs with Blogger and sends his pings from a third-party service not hosted on his site and these do get junked sometimes because of this option.</dd>
</dl>

<strong>3. Link Settings</strong>

<p>This looks at the source <acronym title="Unified Resource Locator">URL</acronym> (trackbacks) or comment author <acronym title="Unified Resource Locator">URL</acronym> (comments), along with any URLs posted in the comment itself.</p>

<ul>
<li>The first option adds to the junk score if a comment has no links, since the general aim of blog spam is to link to a dodgy site to improve its ranking in search engines - a comment with no links is unlikely to be spam.</li>
<li>The second option will forcibly moderate any comment or trackback that has more than a certain number of links. Spam comments tend to have lots of links, but some commenters may post legitimates lists of links to other handy resources so don&#8217;t set this too low.</li>
<li>The third option is like the second but will subtract from the junk score. Set this higher than the previous option if you have it enabled - mine is at 4.</li>
<li>The &#8216;Link memory&#8217; option adds to the score if you have previously approved a comment containing the same URLs. This means that regular commenters are less likely to fall foul of any other rules. Keep this enabled.</li>
<li>The &#8216;Email memory&#8217; does the same, except with email addresses. This may be undermined if you approve a lot of comments with false email addresses, though, and obviously doesn&#8217;t really work with trackbacks.</li>
</ul>

<strong>4. Keyword Filter Settings</strong>

<p>These options act upon keywords in comments, and again replicate some functionality of <acronym title="Movable Type">MT</acronym>-Blacklist. The first box contains words that should force a comment to be moderated, a new word on each line. In here you would want to put in words that may be indicative of spam but could also be used often in a legitimate sense. Mine includes words like &#8216;video&#8217;, &#8216;sexy&#8217;, &#8216;bankruptcy&#8217; and a variety of swear words.</p>

<p>The second box contains words that would force a comment to be junked. By default, the junk status gets subtracted by 1 every time one of these words is found, so a spam comment that mentioned 3 different drug names all blocked by this list would get 3 taken off its junk score. However, if there are words that you think will never be used in a legitimate comment, you can put a number after it and any comment containing that word will have its junk status subtracted by that number. So putting in &#8220;<kbd>viagra 4</kbd>&#8221; would subtract 4 from the junk score of any comment containing the word &#8216;viagra&#8217;.</p>

<p>In both cases, you can also use <acronym title="Practical Extraction and Report Language">Perl</acronym> regular expressions - one I use is &#8220;<kbd>direct(v|tv)</kbd>&#8221; which matches &#8220;directv&#8221; and &#8220;directtv&#8221;.</p>

<p>This is an incredibly powerful feature, however by default the plugin hardly has any keywords in it. A good starting place is, ironically enough, <a href="http://codex.wordpress.org/Spam_Words">The Wordpress Wiki</a> which has a list that you can paste in. And it doesn&#8217;t necessarily have to be words - I get a lot of spam from a site called xxlfind.biz, so I added &#8220;<kbd>xxlfind.biz 4</kbd>&#8221; to my list to block out the spam from it.</p>

<p>SpamLookup does miss out a couple of handy features, namely blocking of duplicate comments and trackbacks, and better throttling, like in <a href="http://philringnalda.com/blog/2004/08/real_comment_throttle_plugin_01.php">Real Comment Throttle</a>, but after doing the above on my site, I get around 98% accuracy, with only a couple of trackbacks getting marked as junk and no false negatives (spam that gets through all the filters). Hopefully you will too.</p>]]>
   </content>
</entry>

</feed>

