Project Honeypot & Coldfusion Part 2

One of my more popular posts has been Stopping Comment Spammers & Email Harvesters with Coldfusion & Project Honeypot. This code has been working very well for me and I have seen a noticeable decrease in comment spam. It also seems to be working for Project Honeypot, at least in a small way.

My Stats

  • Harvester visits to your site(s): 42
  • Recent visits (this week): 3
  • Recent visits (this month): 9
  • Spam traps issued on your sites: 304
  • Spam received at your addresses: 1,089
  • Received this week: 112
  • Received this month: 417
  • Comment spam posts to your site(s): 0

A code update.

One of the things I noticed since implementing the code in my previous post, my site page load times were up quite a bit. The reason is that the code uses http:Bl to do a DNS look up to the project servers for every page load. This takes -time-. I decided to add my own white list table and some code to eliminate these multiple look-ups.

The table is simple, just

visitor_ip_addys [varchar(15)]
visitdate [datetime]

I added the following function to my Honeypot CFC

<cffunction name="newVisitorCheck" returntype="string">
   <cfargument name="ip" required="yes" type="string">
   <cfset var vQry = "">

  <cfquery name="vQry" datasource="myDSN">
    select ipaddy from visitor_ip_addys where ipaddy = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.ip#">
  </cfquery>

 <cfif vQry.recordcount eq 0><!--- then it's a new visitor  --->
   <cfset result = "new">
 <cfelse>
   <cfset result ="existing">
 </cfif>
<cfreturn result>
</cffunction>

And changed my honeyPotCheck function to

<cffunction name="honeypotcheck" returntype="struct" hint="Check Project HoneyPot http:BL">
  <cfargument name="ip" required="yes" type="string">
  <cfset var aVal = "">
  <cfset var hpkey = "MyKey">
  <cfset var stRet = structNew()>

<!---jb: added check to see if this ip has visited in the last 3 months. We have a table to track ips which is retained for 3 months. IP's that check as clean
against http:BL are added to this table to increase page load performance. The table is cleared every 3 months to revalidate visitors (in case they may have been
compromised in that time and to keep table size reasonable --->

<cfinvoke method="newVisitorCheck" returnvariable="result">
<cfinvokeargument name="ip" value="#arguments.ip#">
</cfinvoke>

<cfif result eq "new">
  <!--- Get the different IP values --->
  <cfset aVal = listToArray(gethostaddress("#hpkey#.#reverseip(arguments.ip)#.dnsbl.httpbl.org"),".")>

        <cfif aVal[1] eq "IP-Address not known"><!--- jb: added evaluation of array for good addresses --->
        <!--- set a value indicating ok address --->
            <cfset stRet = {type=99}>
            <!--- insert into visitor_ip_addys table as this is a clean IP --->

            <cfquery name="iQry" datasource="MyDSN">
            insert into visitor_ip_addys (ipaddy, visitdate) values
            (<cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.ip#">,
            <cfqueryparam cfsqltype="cf_sql_timestamp" value="#now()#"> )
            </cfquery>

        <cfelse>
          <!--- there was a match so set the return values --->
          <cfset stRet.days = aVal[2]>
          <cfset stRet.threat = aVal[3]>
          <cfset stRet.type = aVal[4]>

          <!--- Get the HP info message ie: threat level --->
          <cfswitch expression="#aVal[4]#">
           <cfcase value="0">
            <cfset stRet.message = "Search Engine (0)">
           </cfcase>
           <cfcase value="1">
            <cfset stRet.message = "Suspicious (1)">
           </cfcase>
           <cfcase value="2">
            <cfset stRet.message = "Harvester (2)">
           </cfcase>
           <cfcase value="3">
            <cfset stRet.message = "Suspicious & Harvester (1+2)">
           </cfcase>
           <cfcase value="4">
            <cfset stRet.message = "Comment Spammer (4)">
           </cfcase>
           <cfcase value="5">
            <cfset stRet.message = "Suspicious & Comment Spammer (1+4)">
           </cfcase>
           <cfcase value="6">
            <cfset stRet.message = "Harvester & Comment Spammer (2+4)">
           </cfcase>
           <cfcase value="7">
            <cfset stRet.message = "Suspicious & Harvester & Comment Spammer (1+2+4)">
           </cfcase>
          <!---  <cfdefaultcase> jb: moved to top of function as we can't eval the array if there is no lookup response ie: not match in http:BL
            <cfset stRet.message = "IP-Address not known">
           </cfdefaultcase> --->
          </cfswitch>

        </cfif>
  <cfelse>
    <!--- good address  --->
    <cfset stRet = {type=99}>
</cfif>
  <cfreturn stRet>
 </cffunction>

As you can see from the comments in the code, I do the look-up (newVisitorCheck) when honeypotcheck is invoked, which is on each page load. The check does a query to see if that IP is in our white list table. If it is, then we skip the rest of the check and do not do a http:Bl DNS query. If it does not exist in our white list, that either means that the IP is new so we need to check it, or that it is a known bad IP. This means that new visitors have a slightly longer wait on first page load as we are doing the look-up, but then if they pass the look-up, we add them to the white list* and do not slow them down for subsequent page loads. As noted in the comments, we keep entries in the white list for 3 months (an arbitrary number).

After 3 months, we remove the IP from the white list so we can recheck it to make sure the IP hasn’t been compromised.

The code to do this is:

<cffunction name="ipTableCleanup" access="Remote">
<cfquery name="deleteIP" datasource="myDSN">
    delete from visitor_ip_addys where visitdate <= DATE_ADD(CURRENT_TIMESTAMP, INTERVAL -90 day)
</cfquery>
</cffunction>

This is run every day via a schedule task set up in CFAdmin.

All in all, this seems to be working quite well as page load times are back to where they were before the Honeypot implementation and the Honeypot is still doing its job.

*Note that since you are capturing & storing IP addresses, your privacy policy should reflect this fact.

Advertisements