derefspam.pl - Use MT-Blacklist rules to remove referral spam
Warning: If you use awstats, make sure you upgrade immediately! I've decided to stop using it expecially now that I've seen this nifty WordPress plugin.
It seems like I enjoy fighting blog spammers more then I enjoy posting to my blog lately. Tom Sherman linked to a post of mine about how I was trying to deal with referral spam. It was suggested that a good idea would be to use the MT-Blacklist file to actively filter out referral spam from your log files. I thought that was a pretty good idea too, so I wrote a little perl script. Probably the best way to use it would be to run it right before your log analyzer processes it and then rotate the log. (I'll leave that up to you)
Download derefspam.pl v.2 (01-23-2005) Download my blacklist.txt (01-23-2005) Download my whitelist.txt (01-23-2005) Download MT-Blacklist's blacklist.txt
Update: Version .2
- added optional whitelist file
- added optional second blacklist file
- added code to only check the referral field making it about 3x faster
Statistics: Completed 153709 lines in 266 seconds. (about 578 lines/second)
Usage: derefspam.pl [OPTIONS]
Take a log file, search through it an remove any lines that match lines in
the blacklist file and output it to another file.
Mandatory arguments:
-i, --in file path to log
-o, --out file path to output cleaned log
-b, --blacklist file path to blacklist rule file
Optional arguments:
-s, --spam file path to output lines that match blacklist
-x, --myblacklist file a second blacklist (so you can keep a second
blacklist that you maintain and overwrite the one
you download from MT-Blacklist)
-m, --mydomain 'domain' ignore referrals from this domain, this should
speed up processing time by ignoring common
domains you can also seperate multiple domains
with a | character and no spaces, and enclose in '
-w, --whitelist file path to a whitelist, same syntax as the blacklist
use this instead of mydomain if you have a lot
-d, --debug print extra debug info
-h, --help what you're reading right now
Example:
./derefspam.pl -b blacklist.txt -w whitelist.txt -i juju-combined.log
-x myblacklist.txt -o juju-derefspam.log -s juju-refspam.log
-m 'juju.org|google.com' -d
Trackbacks
Use the following link to trackback from your own site:
http://juju.org/articles/trackback/326
Comments
Leave a response, Track co.mments
Juju