Stopping Comment Spammers & Email Harvesters with Coldfusion & Project Honeypot

UPDATE: I’ve added some additional code to improve performance, so after you’ve read this please check out the post Project Honeypot & Coldfusion Part 2

Over the last couple of months I’ve seen a huge increase in the number of comment spammers hitting a “Website Feedback” form on my site. I’ve been meaning to add reCaptcha to the form but hadn’t gotten around to it. And since reCaptcha may have been cracked, it might not be the best option anyways.

Anatomy of Comment Spam

The sender is usually a nonsense email like uqkropilz@spammer .net. One interesting thing I noticed was that initially, comments sent in had no links. They were simple things like

Comments: Hi there, I completely agree with what you r saying

My thought is that this seemingly pointless spamming was some kind of reconnaissance, however I don’t really know.  Now I wasn’t too worried about this as the spambot was only filling out a form which generated an email. However, after a period of a couple of weeks links started to appear and then more and more comments with more and more links arrived.

Comments: Hi there, Common sense is not so common. Goldm nze franc.ios idg imp 10 crona 1912 “persistant cough” “no fever” “no phlegm” Lincoln southwest emeralds Antonioorons Examen reglas de la carretera de la florida

The links were to sites that do who knows what. (Don’t worry, I’ve removed the links in the example). Generally the bot’s programmers would hope that their creation was hitting something that would be publicly viewable, like a blog comment. In my case, the spam was only coming to an email address, so it wasn’t doing what was hoped (unless I was foolish enough to click on a link). However, the volume that began to hit my inbox was beginning to get irritating.

Since this is a web form submission, I have been capturing the IP address (letting users know) and sending it with the feedback. With this data, I started creating a black list of my own, hardcoding them in my blocker.cfm to stop the bot from accessing pages on my site. The problem with this is that the IP’s would rotate so frequently that I was always adding new ones and maintaining the list in hardcoded format was becoming a pain. I thought about creating a table for the black list IP’s and then just doing a query against it. Before building something myself, I decided to see if there was something already out there. I try not to recreate the wheel whenever I can. I found a CFC on mximize.com which looked like it would do just what I wanted. (turns out it needed some tweaking as shown below but that’s ok 😉 The code is pretty straight forward but you need an API key from Project HoneyPot. This key is free, however you do need to participate in the project. There are several levels of participation, from installing a full honeypot to just telling some friends. All will get you a key. I decided to install a honeypot.

The Honeypot

Note that this isn’t a necessary step in setting up the comment spammer code, but I decided to participate because it’s a good project & it will help reduce spam on my site and on the web in general.

It’s very easy. They have a quick 4 step wizard which provides you some code in 9 different scripting languages. I, of course, chose coldfusion and was provided a custom CFM file to add to my site. Once you have uploaded the cfm to your webserver, you need to modify pages to add links to the CFM file they provided. The wizard gives you 8 examples of how to do this. It is recommended that the format vary to keep the bots from seeing a pattern. Adding these lines of code to dozens of pages would be a bit of a pain but I’m using a modified version of Fusebox, so I have layout files which all my various pages use. There are 9 of them for my site so I could have added a line to each and been done with it. However I have one file that is included on every page of  my site and that’s analytics.cfm, my code for Google Analytics. Adding the link code there would be the easiest except I needed to create a way to pull up different links on each page load. This is a simple thing using randrange() and cfswitch.

<!---project honeypot traps  --->
<cfset rnd = randrange(1,8)>

<cfswitch expression="#rnd#">
<cfcase value="1">
<a href="http://www.myserver.com/artisticelective.cfm"><!-- grandioso-flour --></a>
</cfcase>
<cfcase value="2">
<a href="http://www.myserver.com/artisticelective.cfm"><img src="grandioso-flour.gif" height="1" width="1" border="0"></a>
</cfcase>
<cfcase value="3">
<a href="http://www.myserver.com/artisticelective.cfm" style="display: none;">grandioso-flour</a> <div style="display: none;">
</cfcase>
<cfcase value="4"><a href="http://www.myserver.com/artisticelective.cfm">grandioso-flour</a></div>
<a href="http://www.myserver.com/artisticelective.cfm"></a>
</cfcase>
<cfcase value="5"><!-- <a href="http://www.myserver.com/artisticelective.cfm">grandioso-flour</a> -->
</cfcase>
<cfcase value="6">
<div style="position: absolute; top: -250px; left: -250px;"><a href="http://www.myserver.com/artisticelective.cfm">grandioso-flour</a></div>
</cfcase>
<cfcase value="7"> <a href="http://www.myserver.com/artisticelective.cfm"><span style="display: none;">grandioso-flour</span></a>
</cfcase>
<cfcase value="8"><a href="http://www.myserver.com/artisticelective.cfm"><div style="height: 0px; width: 0px;"></div></a></cfcase>
</cfswitch>

artisticelective.cfm is the file provided by the project and I use a random number between 1 & 8 to add a single line of code at each page load. This satisfies the requirement of the project and makes my life very simple. Note that should a human actually find and click one of these links it shows a Terms of Use page. Having this ToU page also provides you with some legal recourse in the (infinitesimally) small chance you could find and prosecute a spammer.

Now, on to the Comment Spam protection.

http:BL

http:BL is a Black List database of known and suspected comment spammers & email harvesters. It has an API which gives registered users the ability to check site visitors against the Project Honeypot black list. As noted in the FAQ, this does not contain blacklisted email servers, as we are dealing with bots & not rogue or open relay mail servers.

The BL API requires you to pass the DNS servers at dnsbl.httpbl.org an IP in reverse format. ie: 127.0.0.1 is passed as 1.0.0.127. The API passes back a string of octets in the form

127.3.5.1

where the first is always 127 (when query is properly formatted), the second is the number of days active as a tracked IP, the third is the threat score and the fourth is the type of visitor.

The full API docs are here.

The CFC does a several things and I’ve added hints to explain what each function is doing. Create a new CFC and drop it in your CFC directory. Note you need to replace YOUR_API_KEY with, well, your API key. There were a few problems with the code from mximize.com (perhaps CF version difference related?) so I made some small mods to the code to work (with CF8 at least).

 <cfcomponent displayname="DNS functions">
<!--- Original code from http://www.mximize.com/fighting-comment-spam-with-project-honeypot with modifications by JayB sidfishes.wordpress.com--->
 <cffunction name="honeypotcheck" returntype="struct" hint="Check Project HoneyPot http:BL">
  <cfargument name="ip" required="yes" type="string">
  <cfset var aVal = "">
  <cfset var hpkey = "YOUR_API_KEY">
  <cfset var stRet = structNew()>

  <!--- Get the different IP values --->
  <cfset aVal = listToArray(gethostaddress("#hpkey#.#reverseip(arguments.ip)#.dnsbl.httpbl.org"),".")>

<cfif aVal[1] eq "IP-Address not known"><!--- jb: added evaluation of array for good addresses --->
<!--- set a value indicating ok address --->
    <cfset stRet = {type=99}>
<cfelse>
  <!--- there was a match so set the return values --->
  <cfset stRet.days = aVal[2]>
  <cfset stRet.threat = aVal[3]>
  <cfset stRet.type = aVal[4]>

  <!--- Get the HP info message ie: threat level --->
  <cfswitch expression="#aVal[4]#">
   <cfcase value="0">
    <cfset stRet.message = "Search Engine (0)">
   </cfcase>
   <cfcase value="1">
    <cfset stRet.message = "Suspicious (1)">
   </cfcase>
   <cfcase value="2">
    <cfset stRet.message = "Harvester (2)">
   </cfcase>
   <cfcase value="3">
    <cfset stRet.message = "Suspicious & Harvester (1+2)">
   </cfcase>
   <cfcase value="4">
    <cfset stRet.message = "Comment Spammer (4)">
   </cfcase>
   <cfcase value="5">
    <cfset stRet.message = "Suspicious & Comment Spammer (1+4)">
   </cfcase>
   <cfcase value="6">
    <cfset stRet.message = "Harvester & Comment Spammer (2+4)">
   </cfcase>
   <cfcase value="7">
    <cfset stRet.message = "Suspicious & Harvester & Comment Spammer (1+2+4)">
   </cfcase>
  <!---  <cfdefaultcase> jb: moved to top of function as we can't
        eval the array if there is no lookup response ie: not match in http:BL
    <cfset stRet.message = "IP-Address not known">
   </cfdefaultcase> --->
  </cfswitch> 

</cfif>

  <cfreturn stRet>
 </cffunction>

 <cffunction name="gethostaddress" returntype="string" hint=
             "I do the dns lookup against the http:bl servers">
  <cfargument name="host" required="Yes" type="string" />
  <cfset var obj = "">
<cftry><!--- jb: added error handling as error is thrown if host
             lookup has no match in http:BL ie: it's not been reported as a problem --->
  <!--- Init class --->
  <cfset obj = CreateObject("java", "java.net.InetAddress")>
<cfset result =  obj.getByName(host).getHostAddress() >
<cfcatch type="any">
    <!--- an "error" in this case is an unknown address, which means it is not reported to http:BL --->
    <cfset result="IP-Address not known">
</cfcatch>
</cftry>
  <!--- Return result --->
  <cfreturn result>
 </cffunction>

 <cffunction name="reverseip" returntype="string" hint=" I return IP in reverse format as required by http:BL api" >
  <cfargument name="ip" required="Yes" type="string" />
  <cfset var aIp = listToArray(arguments.ip,".")>

  <!--- Return IP reversed --->
  <cfreturn aIp[4] & "." & aIp[3] & "." & aIp[2] & "." & aIp[1]>
 </cffunction>

</cfcomponent>

Once you’ve got your CFC on your server you just add this code to pages you want protected.

<!--- Check Project HoneyPot --->
<cfinvoke returnvariable="stCheck" method="honeypotcheck" component="com.HoneyPotdns">
<cfinvokeargument name="ip" value="#cgi.remote_addr#" /><!---jb: changed remote_host to remote_addr --->
 <!--- <cfinvokeargument name="ip" value="91.201.66.172" /> ---><!--- jb: known bad ip for testing --->
</cfinvoke>
<!--- Don't display the personal information --->
<cfif isDefined("stCheck") AND (stCheck.type GTE 4 AND stCheck.type LTE 7)>
  <!--- Send 404 message --->
  <cfheader statuscode="404" statustext="Not Found">
  404 Not Found
  <cfabort>
</cfif>

I added this code to blocker.cfm which is called on every page load. If the threat level returned by http:BL is 4 (or whatever you decide to set) then a simple 404 is returned and the comment spammer bot is stopped dead in it’s tracks.

All in all a pretty cool solution I think.

Advertisements

5 Responses to Stopping Comment Spammers & Email Harvesters with Coldfusion & Project Honeypot

  1. Pingback: Project Honeypot & Coldfusion Part 2 « Sid’s FishNet

  2. Hey Jay, I know this is an old post (and I’ve seen the updated version also). First I wanted to thank you for posting it and the code. But second, I wanted to ask if you may be willing to contribute your code to the httpBl implementations page. There are no CFML versions there at all for now: http://www.projecthoneypot.org/httpbl_implementations.php
    I’m sure many would appreciate it. I’m not using it myself for now, as I’ll be trying instead to use the available IIS implementation to block things even before getting to CF, but some may not want to do that for various reasons so they may love seeing some CFML code to use httpBl, and they may not find it here.
    Keep up the great work.

    • JayB says:

      Thanks Charlie, I will definitely have a look at posting the code. It’s been quite a while since I wrote it so I’ll have to look at it again to make sure it is properly formatted and commented (and that I’d still write the code that way 🙂

      • carehart says:

        Great to hear, and fair enough. 🙂 That said, if you may find it could fall between the cracks (the formatting and checking to see if you may write it differently), I’d propose that there would be value even in offering it as-is, knowing that (at least the updated version) worked for you. You could even add a comment (when you offer it) that you might like to offer a still more-updated one. They may take it as is and happily post the corrected version when ready.

        But I trust your judgment. Would just hate to see it fall between the cracks in what may be busy days for you. Again, either way, thanks for posting the code. Hope others may find it one way or another. 🙂

        /charlie

  3. Pingback: http:BL_CFML – A Project Honeypot Blacklist Implementation for ColdFusion. | Sid's FishNet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: