
Google Analytics, .htaccess, and Spam Bot Referrals
Note: this article was written in 2015 for 2015 and for educational purposes only. Some of the content may be out of date.
Remember when what you saw in Google Analytics was pretty accurate? Now there is a growing number of bots crawling the web, slowing down servers, and diluting Google Analytics. Over the past month, my website, riseofweb.com, has been targeted by some different spam referrals. There are several articles on the web that can explain how to remove these referrals from future analytics reporting but not from your past reporting. I have made this article as a guide on how to remove spam referrals both from analytics future and past, but also from being able to access your website entirely.
Spam Referrals List
Note: do not visit any of these websites. Many are malware infested…
- Ilovevitaly.co, iedit.ilovevitaly.com, shopping.ilovevitaly.com, and forum.topic55223486.darodar.com – This is part of ilovevitaly.com, a spam crawler in Russia.
- priceg.com
- blackhatworth.com
- buttons-for-website.com
- hulfingtonpost.com – obvious made to look like huffingtonpost.com
- Your website may have these. It may also have different spam referrals…
How to remove spam bot referrals via .htaccess
This is my optimized spam bot-blocking script. As you can see, it uses regex. This script catches all subdomains (for example, iedit.ilovevitaly.com) and because of ilovevitaly.co, ilovevitaly.ru, ilove…, I have made the script block all variations of domain name extensions. The regex is very simple to modify. Just replace the domains (DO NOT include the domain extensions .com, etc.) with whoever is spamming you.
Add to your .htaccess file
# BEGIN BLOCK SPAM REFERRALS
RewriteEngine On
RewriteCond %{HTTP_REFERER} (?:savetubevideo|srecorder|kambasoft|ilovevitaly|iloveitaly|net\.hts|priceg|darodar|econom|buttons\-for\-website|blackhatworth|hulfingtonpost|adviceforum)\.((?!\.).)*?$ [NC]
RewriteRule .* - [F]
# END BLOCK SPAM REFERRALS
.HTACCESS can’t block everything. I have tried a few different variations with my blocking script, and I have found that some websites still made it into Google Analytics. How, why?
Google Analytics Blocking Spam Bot Referrals
Just because visits/referrals show up in analytics doesn’t mean it was actually on your server or visited your website. Some spam is just targeting Google Analytics UA codes. In analytics, there are some basic ways to block spam referrals.
Hostname Filtering
In analytics, you can see your hostname referrals. You may see your main domain name, such as “riseofweb.com,” and other domain names that you have redirected, but you may see some domain names you do not own. These mystery domain names are most likely spam referrals.
In Google Analytics > Admin > View > Filters, create a new filter. I called my filter “Hostnames”.
See the below example image.
Hiding hits from honest bots in Google Analytics
In analytics, there is a newer setting in your “View Settings” called “Exclude all hits from known bots and spiders” check this box. Whether or not a spam bot would acknowledge or Google Analytics would see it this is up to the bot.
Creating a segment in Google Analytics to hide spam bot referrals
Now that you know you have a lot of bad data inside of your Analytics, it is time to hide past spam bot sessions/visits. In the normal report area “Reporting,” on any page, click on “+ Add Segment,” “+ NEW SEGMENT.” We are going to create a filtered view to hide the bot traffic. See the image for the settings. In “Advanced > Conditions,” filter Users, Exclude, source/medium, is one of… In the input/text area, start typing here, is it will show a drop-down of your website’s referrals. Just pick the referrals you do not want to show in your report. Then press “Save”. The spam data happened to be about 30% of my total website traffic.
Conclusion
Now I finally have clean data again. After doing all of this research, testing, and hitting my head against the wall, I wanted to share this with everyone. I remember back in the day (2 years ago) when this was not even an issue or even on my radar of issues with Google Analytics. I hope this helps. If you have any questions or comments, please leave one below.
Sources I used to gain this knowledge
- .htaccess tester – test your redirects and blocked referrals, great tool.
- Ban ilovevitaly spam referrals – this helps explain what ilovevitaly is and how to stop it
- Blocking WordPress Referrals – this explains some .htaccess tricks to block bad robots.
5 Comments