One of the issues facing all webmasters is bad bots. Whether it’s comment spam, drive-by hacking attempts, or DDoS attacks, you’ve probably seen the issues some automated traffic can cause.

In this blog post, we’ll be delving into an easy way of stopping common bad bots, using .htaccess files and mod_rewrite. If you’re using the Apache web server, an afternoon of setting up a hardened .htaccess file can save you many headaches down the road.

If you’re not already aware, a .htaccess file is a hidden file (hence the dot in front of it) that gives Apache web servers instructions on how to handle traffic hitting the folder it lives in, and folders below it. It’s a plain text file, which you can just create in a folder.

Blocking bad user agents

First off, we might want to block some generic bad bots, or user agents clearly indicative of an automated program. Here’s how we do that:

Usually, if a bot’s developer doesn’t bother changing their bot’s user agent from the default, they’re up to no good. You’ll commonly see these kinds of bots probing for phpmyadmin, for example. But we can do more.

Intro to blocking HTTP headers

Many bots use valid HTTP user agents, masquerading as a legitimate web browser. Fortunately for us, many of them are still based on the same automated libraries, and often get their HTTP headers slightly wrong, or send different ones from what a human would send. It’s hard to filter these because the same goes for legitimate, good bots (like Google), but let’s block the ones we can:

Advanced blocking: WordPress

The next part of this guide assumes you’re running WordPress. It can be adapted to any other software (you should seriously think about doing so!), and it’s some of the most effective filtering in this entire guide. Unfortunately, we can’t account for all software.

The following assumes the wp-login.php lives in the same folder as the .htaccess file you’re creating:

Bonus round: block HTTP/1.0

HTTP/1.0 is an old version of the HTTP protocol. Humans haven’t used it since the days of netflix, but many bots, both good and bad, still do. Common search engines like Google tend not to. We can turn this to our advantage, but it needs to be done carefully, and tested extensively, as it can block some good bots, or have false positives on servers using a proxy in front of Apache.

If you feel daring, uncomment the version of this rule you prefer:

Adapting these rules to your own software and website setup can drastically cut down on comment spam, and even help protect your website from hacking. It’s not a panacea, but it’ll help make life a little easier.