Kill Bots

Bots , bots and more bots. The vast majority of security related crawling is done by bots and botnets. So much so they they don’t even follow links or urls anymore, they just crawl straight through IP blocks on a massive scale.

Not only are they out jabbing your site for security holes but the constant requests can also add to your bandwidth. Many bots are trying to get root level access, others go straight for known WordPress exploits or try to inject malicious scripts, some are even just fingerprinting your site for attack indexing. To combat the onslaught of bots we will list defense measures that have already been listed in the other guides but specific to bots themselves. This is for security concerns but can also related to spam bots and content scrapers.

Please be aware that you should have working familiarity with apache, .htaccess and server configurations to use this guide, outside the plugins listed at the bottom.

1. htaccess rules and blacklists

Though .htaccess rules and blacklists can be used to control bots there are better methods if you have root level access to your OS. If your on a shared server you most likely can only implement .htaccess based rules. Managing bots this way is practically impossible, most bots use Mozilla as a user-agent or unique names .

There is no simple solution to fighting bots besides getting down into your server logs and finding which ones are causing the most problems, to really fight them you have to be very active.

A simple example of a .htaccess block list for some very common bad bots, the most important being the first line, which is a bot using a blank user-agent. Don’t copy/paste this, it is just an example if you choose to use this marginal technique.

SetEnvIfNoCase User-Agent ^$ bad_bot    #this is for blank user-agents
SetEnvIfNoCase User-Agent "^Jakarta" bad_bot
SetEnvIfNoCase User-Agent "^User-Agent" bad_bot
SetEnvIfNoCase User-Agent "^libwww," bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "^Snoopy" bad_bot
SetEnvIfNoCase User-Agent "^PHPCrawl" bad_bot
SetEnvIfNoCase User-Agent "^WEP Search" bad_bot
SetEnvIfNoCase User-Agent "^Missigua Locator" bad_bot
SetEnvIfNoCase User-Agent "^ISC Systems iRc" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot

 <Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Banning just bad-bots with no user-agent using Rewrite instead ( for POST).

RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule .* - [F]

Larger lists of bad bots and user-agents.

 

2. Kill bots using bad query strings and directory browsing

Bad bot’s use malicious query strings and directory browsing to look for holes in your plugins/themes and core files. Thankfully most of them are outdated and used by script kiddies, but that is not always the case.

# stop directory browsing this goes in your root .htaccess
Options All -Indexes
Options +FollowSymLinks

Perishable press maintains a query .htaccess ruleset that is often updated. This can be used in conjunction with PHPIDS and query rules if using Apache’s Mod Security.

 

3. Kill bots looking for root logins

This one is pretty straightforward, you can go from thousands of bots snooping around to pretty much zero.

1. Change your default ssh port #
2. Disable root and user based logins
3. Implement port knocking.

Extras
1. Use fail2ban
2. If for some reason you need to keep root logins, install Denyhosts.

 

4. Use fail2ban

Fail2ban is a package that reads you server log files and makes rules based on them, aptly called “jails”. For instance you might log bot crawling attempts to your ssh, with fail2ban we can lockout any failures by IP for a time period.

Since fail2ban is a rule-set based on logs, you can use it in conjunction with more advanced firewalls such as mod_security’s bot blocking list, or a bad header requests/query string rule set.

Documentation on how to use fail2ban can be found on their site.

 

5. Use Mod_security

Mod_security is an feature rich apache firewall. It is much easier to manage bad bots with mod_sec because you can separate the rule-sets for organizing, logging, and also get updated rules from a central source. Current daily rule-sets are commercial and cost money, they usually release them for free after 90 days, from http://www.modsecurity.org/ or http://www.gotroot.com.
An example of an updated bot specific rule list can be found on sourceforge.net

 

6. Honey-pots

Honey-pots can be used to trap or gather data for the constant war versus bot and botnets to further your defenses. For instance you can craft a hidden field for your WordPress login that a bot thinks is real ( yet is hidden from actual users). When the bot tries to fill out the hidden field and submit, you can log the attempt for banning , redirect the bot, or really mess with it and throw it into an infinite garbage loop.

WordPress honeypot plugins:

Some resources: