Google Analytics, .htaccess, and Spam Bot Referrals

Note: this article was written in 2015 for 2015 and for educational purposes only. Some of the content may be out of date.

Remember when what you saw in Google Analytics has pretty accurate? Well now there is a growing number of bots crawling the web, slowing down servers, and diluting Google Analytics. Over the past month my website riseofweb.com has been targeted by some different spam referrals. There are several articles on the web that can explain how to remove these referrals from future analytics reporting, but not from your past reporting. I have made this article as a guide on how to remove spam referrals both from analytics future and past, but also from being able to access your website entirely.

Spam Referrals List

Note: do not visit any of these websites, many are malware infested…

  • Ilovevitaly.co, iedit.ilovevitaly.com, shopping.ilovevitaly.com, and forum.topic55223486.darodar.com – This is part of ilovevitaly.com, a spam crawler in Russia.
  • priceg.com
  • blackhatworth.com
  • buttons-for-website.com
  • hulfingtonpost.com – obvious made to look like huffingtonpost.com
  • Your website may have these, it may also have different spam referrals…

How to remove spam bot referrals via .htaccess

This is my optimized spam bot blocking script. As you can see it uses regex. This script catches all subdomains (example iedit.ilovevitaly.com), and because of ilovevitaly.co, ilovevitaly.ru, ilove…, I have made the script block all variations of domain name extensions. The regex is very simple to modify, just replace the domains (DO NOT include the domain extensions .com, etc..) with who ever is spamming you.

Add to your .htaccess file

# BEGIN BLOCK SPAM REFERRALS
RewriteEngine On
RewriteCond %{HTTP_REFERER} (?:savetubevideo|srecorder|kambasoft|ilovevitaly|iloveitaly|net\.hts|priceg|darodar|econom|buttons\-for\-website|blackhatworth|hulfingtonpost|adviceforum)\.((?!\.).)*?$ [NC]
RewriteRule .* - [F]
# END BLOCK SPAM REFERRALS

.HTACCESS can’t block everything. I have tried a few different variations with my blocking script and I have found that some websites still made it into Google Analytics. How, why?

Google Analytics Blocking Spam Bot Referrals

Just because visits / referrals shows up in analytics doesn’t mean it was actually on your server or visited your website. Some spam is just targeting Google Analytics UA codes. In analytics there are some basic ways to block spam referrals.

Analytics Referals, many are spam, "Apple.com", that is a fake refferal...

Analytics Referals, many are spam, “Apple.com”, that is a fake refferal…

Hostname Filtering

In analytics you can see your host name referrals. You may see your main domain name, such as “riseofweb.com”, other domain names that you have redirected, but you may see some domain names you do not own. These mystery domain names are most likely spam referrals.

Example referrals, most of which are spam...

Example referrals, most of which are spam…

In Google Analytics > Admin > View > Filters, create a new filter. I called my filter “Hostnames”.
See image for example.

Filter out domain names you do not own, by listing valid domains.

Filter out domain names you do not own, by listing valid domains.

Hiding hits from honest bots in Google Analytics

In analytics there is a newer setting in your “View Settings” called “Exclude all hits from known bots and spiders”, check this box. Whether or not a spam bot would acknowledge, or that Google Analytics would see it, this is up to the bot.

Excluding bot traffic from analytics.

Excluding bot traffic from analytics.

Creating a segment in Google Analytics to hide spam bot referrals

Now that you know you have a lot of bad data inside of your Analytics, it is now time to hide past spam bot sessions / visits. In the normal report area “Reporting”, on any page click on “+ Add Segment”, “+ NEW SEGMENT”. We are going to create a filtered view to hide the bot traffic. See the image for the settings. In “Advanced > Conditions”, filter Users, Exclude, source / medium, is one of… In the input / textarea, start typing here is it will show a drop down of your websites referrals. Just pick the referrals you do not want to show in your report. Then press “Save”. The spam data happened to be about 30% of my total website traffic.

Conclusion

Now I finally have clean data again. After doing all of this research, testing, and hitting my head against the wall, I wanted to share this with everyone. I remember back in the day (2 years ago) when this was not even an issue or even on my radar of issues with Google Analytics. I hope this helps, if you have any questions or comments, please leave one below.

Sources I used to gain this knowledge