Featured Solution

Oolong Tees

Oolong Tees

  • Web Site Design
  • Internet Applications
  • Content Management

Featured Product

Imulus Stacks

Central 2

The Imulus Central Framework reduces development, so we can focus on design.

Google Analytics is Under-Reporting, or is it?

For well over a year now, we’ve been using Google Analytics to report our own traffic and give our clients a feature rich, and free way to view their own website statistics. Prior to Google Analytics we used Urchin 5.0, which was acquired by Google and made free just 1 month after we purchased a $900 license for our server. The nice people at Google gave us 50 user account to ease the pain.

When it comes to statistics most viewers are interested in trend data. What are the most popular pages? Is there an increase in visitor traffic from month to month? Where are visitors coming from? Since I’m such a visual person, I find myself looking for trends in charts and graphs more so then the actual hard numbers. Recently, a client of ours with an engineering oriented mind pointed out a serious discrepancy between the actual numbers reported by Google Analytics vs the old Urchin reports their company was using. According to the data either their site lost 50% of it's traffic or Google was under-reporting results!

My initial response was denial. If Google Analytics is built on Urchin why would it under-report visitors? There shouldn't be a difference between Google Analytics and Urchin. After this discussion I went back and examined other client sites and across the board I found the same issue, Google is reporting less traffic then Urchin.

From what I see in our Google Analytics account, search bot traffic is not reported. Google Analytics seems to be looking and just human visitors. For those unfamilar about what I'm talking about, there are two primary types of traffic: human or search bot. Human traffic is obviously the visitors who come to your website, search bot traffic on the other hand is indexed engines like Google, Yahoo or MSN. The search engines visit websites to index the content periodically so that the search engine's records are up to date.

The Evidence
I'm using 2 clients plus our own Imulus logs to illustrate this point. Client A is a software company that uses their website for lead generation. Client B is a hardware company that uses their website for ecommerce.

Client A
Total Website Visitor Sessions for November / December.

Client B
Total Website Visitor Sessions for November / December.

Imulus
Total Website Visitor Sessions for November / December.

Diving Deeper
Using our Imulus logs as the most extreme example, I assumed the difference had to be in the definition of how each system defines sessions. Essentially, I’m assuming the difference is in how each system characterizes a human versus a search bot. For those hard core web statistics people, the information in these reports is based on the default filters in Google Analytics. In Urchin, I've applied the filter to exclude robot traffic by cs_useragent based on finding the following values bot, seek, scan, search, di, agent, get, crawl, spider, scooter, lint, libwww, loader, mechanic, curl, link, catch, fly.

I decided to select a few well known ISPs from our log files. The ISPs I selected domains I know, or at least can safely assume, are not robot traffic. In addition, the filter is set in Urchin to exclude bots from the session information. Here are my results based on the Imulus log files for the month of November, 2006.

Traffic from Selected ISPs

Based on the results above, no pattern can be drawn. It seems completely random as to the difference in values. Note, I did notice in Urchin it is filtering the logs for cs_useragent, however in IIS 6.0 the log files don't seem to have a cs_useragent but they do have a cs(User-Agent). I began to wonder if this is the primary reason for the under-reporting. If this was the case, and Urchin can't differentiate the bots from the log files, I should see a lack of data under my Urchin report for "Browsers & Robots." Apparently, Urchin is able to differentiate between cs_useragent and cs(User-Agent) because my "Browsers & Robots" report has each search bot broken out by total hits. Ideally if bots were reported as sessions, I might be able to better compare bot vs human traffic.

Below is how the traffic appeared to each reporting tool. I expected the pattern to be the consistently similar from day to day, just less traffic according to Google Analytics, again I was wrong.

Session Traffic for the month of November, 2006

The pattern, while approximate, is certainly not a mirror image. The most drastic difference is on 11/8/06. Urchin is showing a serious spike in traffic, while Google Analytics doesn't recognize this day as a spike. In the table below I've pulled out the top visiting domains for November 8th, 2006. There is a discrepancy between Google and Urchin. It is interesting that although I have applied the cs_useragent filter, I still see the bot traffic in my "Domains & Users" report in Urchin.

Top Domain Visitors for November 8th, 2006.

Domain Google Urchin
no domain 17 121
comcast.net 14 16
qwest.net 9 32
keynote.com 3 0
rr.com 2 3
cox.net 2 2
verizon.net 2 4

Google reports total visitor traffic for November 8th, 2006 at 89 visitors. Urchin is telling me there are over 671 visitors for this day, and I can see in this number they are including bot traffic. If I manually remove the search domains from the Urchin reports I would pull out these values.

yahoo.com / 103
inktomisearch.com / 78
live.com / 31
google.com / 24
pnap.com /16
twtelecom.net / 6

Total 258 visits.

Conclusion

While this is by no means a scientific study of the two reporting tools, I personally believe Google's numbers to be more accurate then Urchin.

I believe Urchin's filter are not excluding robot traffic, at least while processing IIS 6.0 log files. In addition, we've developed our own Metrics tools which show us live visitor traffic throughout the day. Our homegrown analytics tools are giving us reports more akin to Google Analytics then Urchin.

Most people want to believe the numbers in Urchin because those numbers are higher and reflect better performance when presenting marketing reports to company executives. There is a serious danger in running with reports which are not accurate. If you believe your visitor level to be 14,000 visitors per week and you are converting only 40 visitors to leads or sales then you have a problem in your site's ability to convert. Yet, if your website is really only receiving 1,400 visitors per week then your conversion look much better.

I'm hoping to follow up this report by looking at other analytics tools and how they compare with Google Analytics. I'd like to post a report using DeepMetrix, WebTrends and ClickTracks. If anyone is interest in receiving our log files for November / December 2006 I'd be glad to share those log files for you to run your own comparisons.

Return to the solutions index