Web metrics firm StatCounter caused a bit of a commotion earlier this week by claiming that Google’s Chrome briefly topped Microsoft’s Internet Explorer as the world’s most popular Web browser during the week of May 7 to May 13. Nobody doubts that Chrome has become a major presence in the Web browser scene since its introduction in 2008, but is it truly at a point where it is more widely used than Internet Explorer?
It depends how you count, and who you ask. And the truth is that nobody knows.
What browser is that?
At first glance, figuring out which Web browsers are the most popular seems easy. Every time a Web browser connects to a site, it identifies itself to the remote site with what’s called a “user agent” field. User agent is a fancy term for Web browser, but it also encompasses scripts, search engine bots, and any other software designed to fetch Web pages. User agent strings can be blank, or they can be simple as a name. However, these days, they’re often a complicated series of names, version numbers, and technologies.
Here’s a user agent label for my primary computer, a Mac running Firefox:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20100101 Firefox/12.0
This reveals quite a lot about my computer. My browser identifies itself as a Mozilla-compatible browser, version 5. It also shows that I’m using a Mac running Mac OS X 10.7 Lion, Firefox is using the Gecko rendering engine, and I’m running Firefox 12. At least this week. (Remote Web servers also get my IP address, but it’s not part of the user agent field.)
A user agent tag from Internet Explorer might look like this:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;
Things are already funky — and this is a short user agent label from Explorer.
Why is Internet Explorer identifying itself as Mozilla? It’s an artifact from the dark days of Web browser history. Back before Internet Explorer existed, Netscape Navigator started identifying itself as “Mozilla” (i.e., the “Mosaic killer”) and Web sites that wanted to use Netscape-specific features (like the ever-popular
<blink> tag) checked for the “Mozilla” user agent to decide what content to serve to the browser — or whether to serve any content at all. In order to play on the same field as Netscape, Microsoft chose to have its first version of Internet Explorer identify itself as Mozilla — and, essentially, every mainstream Web browser ever since has had to do the same. The upshot is that the “Mozilla” label is now effectively meaningless, although Firefox (and other browsers using the Gecko engine) have the most legitimate claim to the name.
The short IE user agent also identifies the version of Internet Explorer, the version of Windows (paradoxically Windows NT 6.1 is Windows 7 — yes, more funkiness), and the version of Internet Explorer’s rendering engine Trident. The Trident string is important for sites that, say, need to distinguish between Internet Explorer 7 and Internet Explorer 8 running in “compatibility mode.” In the real world, Internet Explorer user agent strings are usually much more complicated, with other applications and add-ons (like toolbars) often appending their own information.
Still more funkiness — here’s a user agent string from an iPad’s built-in Safari browser:
Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3
Look, it’s more Mozilla! Only this time, the rendering engine is AppleWebKit (“like Gecko” is something Apple put in there for many of the same reasons Microsoft adopted “Mozilla”). This identifies an iPad running iOS 5.0.
Still funkier, here’s an example of a user agent label from a development version of Google Chrome 20:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6
Oh look, we’re still Mozilla (even though we’re not). This time we’re running Windows 7, except the browser is running in something called WOW64, an environment the 64-bit version of Windows offers for 32-bit applications. (WOW is Windows On Windows…get it?) And Google Chrome is based on the open source WebKit engine, which means it also identifies itself as AppleWebKit and (confusingly) Safari.
This is just a sample. At a very basic level, figuring out what browsers people are using involves collecting, parsing, cataloging, and analyzing these user agent fields. As you can see, they’re rather odd and often unintuitive, with most browsers consistently identifying themselves as something they’re not (Mozilla), using weird versioning (Windows 7 is Windows NT 6.1), and version numbers that may not make consistent sense (see how the iPad’s Safari version is in the thousands, but Chrome’s is in the hundreds? And why is Chrome also Safari?). Between old computers, new computers, bots, applications with built-in browsers, scripts, mobile phones, tablets, ereaders, game consoles, and much more, there are literally tens of thousands of user agent identifiers “in the wild” at any given moment, with new ones appearing all the time. Keeping track of them all — and understanding them — is no small feat, but it’s do-able. After all, keeping track of fiddly bits of information is what computers are good at, right?
If you know how to get browser information from user agent tags, the next thing you need is a lot of tags. Anybody who runs their own Web site — including things like blogs — has probably looked at some sort of log analysis, visitor tracking, or analytics: services that essentially look at the data collected by the Web server and try to give you an idea about your visitors: what countries they’re from, what browsers their using, perhaps what search terms they may have used to find you, and (of course) how many “hits” they produced.
That’s essentially how services like Alexa, Google Analytics, Net Applications’ NetMarketShare, StatCounter’s GlobalStats, WebSTAT, Webtrends, and innumerable others operate. These services are useful for getting a general idea about a site’s visitors, but the results for a particular site can’t be generalized to the entire Web. For instance, a site that’s all about the latest Android apps is probably going to see a disproportionate amount of traffic from Android devices; similarly, a site focusing on Windows gaming news is probably going to see a higher percentage of Windows users than the Internet in general. Even a general-interest site has no way of knowing whether their traffic is in any way representative of the broader Internet.
So, to get an accurate idea of how the worldwide Internet is using Web browsers, you’d essentially have to be able to look at all Web usage, worldwide, for a given period. That’s not possible, so all services that attempt to analyze worldwide browser share (or country-wide, or even site-wide, for complicated sites) all use guesswork.
For better or worse, the two leading services that try to provide estimates of global browser usage are Net Applications’ NetMarketShare and StatCounter.
NetMarketShare and StatCounter
In NetMarketShare’s case, it collects data from a set of about 40,000 Web sites that use it HitsLink analytics service and SharePost bookmarking service. Net Applications says it tracks about 160 million visits a month, and the company tries to limit to counting visitors to a particular site once per day. They then weight the data based on the number of Internet users per country, as reported by the U.S. Central Intelligence Agency. NetMarketShare makes monthly statistics available for free, and offers fee-based services that offer data updated every hour.
Ireland’s StatCounter operates on a similar principle: It collects Web browser data from a network of over 3 million websites that use its StatCounter traffic analysis service. Unlike HitsLink, StatCounter makes some basic services available for free — as a result, StatCounter collects information from a whopping three million Web sites and claims to process more than 15 billion hits per month. That would seem to leave NetMarketShare in the dust. However, StatCounter’s methodology is a bit different than NetMarketShare: StatCounter derives its browser usage data based on raw page loads (or “hits”) and does not try to track individual visitors. Similarly, StatCounter does not attempt to massage or weight its figures by population or any other factors. StatCounter’s philosophy is essentially that a hit is a hit is a hit, and that’s all data collection services really know.
These different approaches produce significantly different numbers: for instance, only StatCounter showed Google Chrome pulling ahead of Internet Explorer during the second week of May. For April, NetMarketShare shows Internet Explorer still maintaining a very comfortable lead over Chrome, accounting for 54 percent of the global browser market — and their weekly stats don’t show any mammoth upswing for Chrome during May. In fact, Chrome managed 18.85 percent, which wasn’t even enough for second place: that was Firefox with 20.20 percent. That’s a huge difference.
How Chrome works
Google has placed a strong emphasis on real-time performance with Chrome. It not only wants Chrome to be the simplest and most intuitive Web browser on the market, it wants Chrome to be the fastest. Beginning with Chrome 13 back in mid-2011, Google started utilizing a new technique to improve Chrome’s apparent performance to end users: It started pre-rendering pages users hadn’t even asked to see. This takes two forms. When a user enters a search query, Chrome will download selected search results onto a hidden page so they can be displayed on screen near-instantaneously when a user clicks a search results link. (Google also implemented a custom header so pages can identify themselves as desiring to be pre-rendered. This applies mainly to Google’s own services.) Starting with Chrome 17 (released in February) Google extended this behavior to Chrome’s omnibox.
Google figures that, most of the time, users want one of the top search results. If so, they’ll get that page as soon as they click the link, no waiting. If it turns out the user wants another page, no harm done: Chrome drops the pre-rendered pages and goes out to the Internet to fetch the selected link: no faster than any other Web browser, but no slower either.
What does this mean for counting Web browser usage? It means that when Chrome users search for things, they generate hits on pages they may never navigate to, in many cases as they’re typing search queries. Searching for Canadian singer Buffy Sainte-Marie? Do it in Chrome, and odds are some top search results hits for Buffy the Vampire Slayer are going to see some hits.
This has a significant impact on services like StatCounter that track Web usage purely on the basis of hits. Chrome users who happen to be searching for something related to a site in StatCounter’s network will be unwittingly going out and fetching those pages. Where a browser like Internet Explorer or Firefox is primarily generating traffic to Google or another search engine until a user clicks on a search result item, Chrome can be fetching pages in the background from all over the Internet — and, odds are, the user will never even look at the majority of those pages.
NetMarketShare says it started discarding data generated from Chrome’s pre-rendering in February 2012.
What other things can make NetMarketShare and StatCounter inaccurate?
The most obvious factor is where they collect their data. Just as the traffic to a single website isn’t representative of the Internet as a whole, neither can StatCounter nor NetMarketShare claim their site networks are representative of the entire Internet. Some 43 percent of the 40,000 sites that NetMarketShare tracks are online commerce sites; similarly, 18 percent are corporate sites, 10 percent are “content sites,” and 29 percent fall outside those categories — perhaps they’re social sites, schools, government services, or affiliate marketers. However, a whopping 76 percent of sites that NetMarketShare collects information from participate in pay-per-click traffic-generation programs. In other words, NetMarketShare is biases towards sites engaged in online business and advertising, and which are willing to pay for a third-party analytics service based in the United States.
StatCounter has far more sites under its wing, because it offers basic Web traffic analysis for free. What sorts of sites are those? The free service makes StatCounter a favorite amongst bloggers, non-profits, small businesses, and other sites that may not be engaging in outright ecommerce or online advertising. It also gives StatCounter a significant presence in emerging economies, where folks who can afford to put up a basic Web site may not be willing to pay for a complicated traffic analysis service. StatCounter is also not based in the United States — for some folks that’s a plus, for others it’s a minus, and some don’t care one way or another. In any case, where NetMarketShare trends towards businesses and organizations with significant online presences, StatCounter tends a bit towards smaller, international sites that are more likely to focus on personal or niche content. If some browsers were more popular outside the United States than others — and there’s a lot of evidence to suggest that’s the case for Firefox, Chrome, and Opera — then StatCounter’s network — and the fact they don’t geographically weight their results — would seem more likely to reflect those differences.
And there’s privacy. Both StatCounter and NetMarketShare are third parties that collect data from websites, and both offer their clients multiple mechanisms to collect that data — some are more visible (or obtrusive) than others. If Internet users opt out of accepting third-party cookies (the default in mobile Safari, for instance), it can defeat NetMarketShare’s assessment of whether a visitor is unique. (They have other techniques they can use, but the numbers get fuzzier.) Moreover, these are exactly the sorts of tracking and analytics services that get blocked by privacy add-ons to Web browsers as well as privacy policies at many sites. (For instance, many schools, libraries, and other organizations block analytics and tracking services to prevent collection of data about minors.)
It can easily be argued that many Internet users don’t even know where to find their browser’s privacy settings, let alone how to install and configure browser add-ons. However, those users are more likely to be technically sophisticated, knowledgeable Internet users — and, for better or worse, those are likely to be the folks that lean towards browsers like Chrome and Firefox rather than Internet Explorer or Apple’s Safari. As a result, some unknowable proportion of Chrome and Firefox users are excluding themselves from things like NetMarketShare and StatCounter — so those browsers may be somewhat under-represented in the figures.
So are they lying?
Neither NetMarketShare nor StatCounter can make a strong claim that their analysis of browser share is truly representative of Internet use as a whole. This does not mean either company is being duplicitous: They’re both performing what they believe is the best analysis they can. But the companies value different things, use different data sets, and aren’t counting the same things. It’s no surprise their numbers don’t agree.
When you see browser share reports from NetMarketShare, just remember it’s their geographically-weighted best guess of unique vistors, based on traffic to a selection of sites that mainly engage in ecommerce and online advertising — and probably with an American bias. Similarly, when you see browser share reports from StatCounter, just remember the figures are based on raw page loads, largely from sites around the world using StatCounter’s free analytics service.
Key words: “best guess.”