A shorter version of this post is available on LessWrong. The LessWrong post will remain static whereas this version will be updated with more examples as they occur to me.
Last year, during the months of June and July, as my work for MIRI was wrapping up and I hadn’t started my full-time job, I worked on the Wikipedia Views website, aimed at easier tabulation of the pageviews for multiple Wikipedia pages over several months and years. It relies on a statistics tool called stats.grok.se, created by Domas Mituzas, and maintained by Henrik.
One of the interesting things I noted as I tabulated pageviews for many different pages was that the pageview counts for many already popular pages were in decline. Pages of various kinds peaked at different historical points. For instance, colors have been in decline since early 2013. The world’s most populous countries have been in decline since as far back as 2010!
Defining the problem
The first thing to be clear about is what these pageviews count and what they don’t. The pageview measures are taken from stats.grok.se, which in turn uses the pagecounts-raw dump provided hourly by the Wikimedia Foundation’s Analytics team, which in turn is obtained by processing raw user activity logs. The pagecounts-raw measure is flawed in two ways:
- It only counts pageviews on the main Wikipedia website and not pageviews on the mobile Wikipedia website or through Wikipedia Zero (a pared down version of the mobile site that some carriers offer at zero bandwidth costs to their customers, particularly in developing countries). To remedy these problems, a new dump called pagecounts-all-sites was introduced in September 2014. We simply don’t have data for views of mobile domains or of Wikipedia Zero at the level of individual pages for before then. Moreover, stats.grok.se still uses pagecounts-raw (this was pointed to me in a mailing list message after I circulated an early version of the post).
- The pageview count includes views by bots. The official estimate is that about 15% of pageviews are due to bots. However, the percentage is likely higher for pages with fewer overall pageviews, because bots have a minimum crawling frequency. So every page might have at least 3 bot crawls a day, resulting in a minimum of 90 bot pageviews even if there are only a handful of human pageviews.
Therefore, the trends I discuss will refer to trends in total pageviews for the main Wikipedia website, including page requests by bots, but excluding visits to mobile domains. Note that visits from mobile devices to the main site will be included, but mobile devices are by default redirected to the mobile site.
How reliable are the metrics?
As noted above, the metrics are unreliable because of the bot problem and the issue of counting only non-mobile traffic. German Wikipedia user Atlasowa left a message on my talk page pointing me to an email thread suggesting that about 40% of pageviews may be bot-related, and discussing some interesting examples.
Relationship with the overall numbers
I’ll show that for many pages of interest, the number of pageviews as measured above (non-mobile) has declined recently, with a clear decline from 2013 to 2014. What about the total?
We have overall numbers for non-mobile, mobile, and combined. The combined number has largely held steady, whereas the non-mobile number has declined and the mobile number has risen.
What we’ll find is that the decline for most pages that have been around for a while is even sharper than the overall decline. One reason overall pageviews haven’t declined so fast is the creation of new pages. To give an idea, non-mobile traffic dropped by about 1/3 from January 2013 to December 2014, but for many leading categories of pages, traffic dropped by about 1/2-2/3.
Why is this important? First reason: better context for understanding trends for individual pages
People’s behavior on Wikipedia is a barometer of what they’re interested in learning about. An analysis of trends in the views of pages can provide an important window into how people’s curiosity, and the way they satisfy this curiosity, is evolving. To take an example, some people have proposed using Wikipedia pageview trends to predict flu outbreaks. I myself have tried to use relative Wikipedia pageview counts to gauge changing interests in many topics, ranging from visa categories to technology companies.
My initial interest in pageview numbers arose because I wanted to track my own influence as a Wikipedia content creator. In fact, that was my original motivation with creating Wikipedia Views. (You can see more information about my Wikipedia content contributions on my site page about Wikipedia).
Now, when doing this sort of analysis for individual pages, one needs to account for, and control for, overall trends in the views of Wikipedia pages that are occurring for reasons other than a change in people’s intrinsic interest in the subject. Otherwise, we might falsely conclude from a pageview count decline that a topic is falling in popularity, whereas what’s really happening is an overall decline in the use of (the non-mobile version of) Wikipedia to satisfy one’s curiosity about the topic.
Why is this important? Second reason: a better understanding of the overall size and growth of the Internet.
Wikipedia has been relatively mature and has had the top spot as an information source for at least the last six years. Moreover, unlike almost all other top websites, Wikipedia doesn’t try hard to market or optimize itself, so trends in it reflect a relatively untarnished view of how the Internet and the World Wide Web as a whole are growing, independent of deliberate efforts to manipulate and doctor metrics.
The case of colors
Let’s look at Wikipedia pages on some of the most viewed colors (I’ve removed the 2015 and 2007 columns because we don’t have the entirety of these years). Colors are interesting because the degree of human interest in colors in general, and in particular colors, is unlikely to change much in response to news or current events. So one would at least a priori expect colors to offer a perspective into Wikipedia trends with fewer external complicating factors. If we see a clear decline here, then that’s strong evidence in favor of a genuine decline.
I’ve restricted attention to a small subset of the colors, that includes the most common ones but isn’t comprehensive. But it should be enough to get a sense of the trends. And you can add in your own colors and check that the trends hold up.
Since the decline appears to have happened between 2013 and 2014, let’s examine the 24 months from January 2013 to December 2014:
As we can see, the decline appears to have begun around March 2013 and then continued steadily till about June 2014, at which numbers stabilized to their lower levels.
A few sanity checks on these numbers:
- The trends appear to be similar for different colors, with the notable difference that the proportional drop was higher for the more viewed color pages. Thus, for instance, black and blue saw declines from 129K and 126K to 30K and 41K respectively (factors of four and three respectively) from January 2013 to December 2014. Orange and yellow, on the other hand, dropped by factors of close to two. The only color that didn’t drop significantly was red (it dropped from 84K to 67K, as opposed to factors of two or more for other colors), but this seems to have been partly due to an unusually large amount of traffic in the end of 2014. The trend even for red seems to suggest a drop similar to that for orange.
- The overall proportion of views for different colors comports with our overall knowledge of people’s color preferences: blue is overall a favorite color, and this is reflected in its getting the top spot with respect to pageviews.
- The pageview decline followed a relatively steady trend, with the exception of some unusual seasonal fluctuation (including an increase in October and November 2013).
One hypothesis that some people might come up with is inter-language substitution: people are substituting away from reading articles in the English-language Wikipedia to other language Wikipedias. But the downward trend is present in many other major language Wikipedias, none of which have the second language status of English.
Here are the numbers for four colors in Spanish (negro = black, azul = blue, rojo = red, blanco = white) for the years 2009-2014 (we exclude 2008 because tracking for the Spanish Wikipedia began only in February 2008):
Similarly, here are the pageview counts for the same four colors in French (noir = black, bleu = blue, rouge = red, blanc = white):
The years 2009-2014:
All months in 2013 and 2014:
Cognitive biases (a small subset thereof)
For a more niche but still timeless set of pages, I picked cognitive biases:
The situation for cognitive biases is qualitatively similar to that for colors: the peak occurred in 2013, and the decline appears to have been between the early months of 2013 and the middle of 2014. Unlike colors, cognitive biases in 2014 still did a lot better than they had done three years ago. Note that, with one exception, all the pages selected here have been around since 2008, and the sole exception doesn’t account for enough pageviews to affect the overall trend.
Geography: continents and subcontinents, countries, and cities
Here are the views of some of the world’s most populated countries between 2008 and 2014, showing that the peak happened as far back as 2010:
Of these countries, China, India and the United States are the most notable. China is the world’s most populous. India has the largest population with some minimal English knowledge and legally (largely) unrestricted Internet access to Wikipedia, while the United States has the largest population with quality Internet connectivity and good English knowledge. Moreover, in China and India, Internet use and access have been growing considerably in the last few years, whereas it has been relatively stable in the United States.
It is interesting that the year with the maximum total pageview count was as far back as 2010. In fact, 2010 was so significantly better than the other years that the numbers beg for an explanation. I don’t have one, but even excluding 2010, we see a declining trend: gradual growth from 2008 to 2011, and then a symmetrically gradual decline. Both the growth trend and the decline trend are quite similar across countries.
We see a similar trend for continents and subcontinents, with a clear peak in 2010 and an otherwise roughly symmetric rise and fall:
In contrast with these large geographic entities, their smaller counterparts, such as cities, peaked in 2013, similarly to colors, and the drop, though somewhat less steep than with colors, has been quite significant. Below is a list for Indian cities:
Some niche topics where pageviews haven’t declined
So far, we’ve looked at topics where pageviews have been declining since at least 2013, and some that peaked as far back as 2010. There are, however, many relatively niche topics where the number of pageviews has stayed roughly constant. But this stability itself is a sign of decay, because other metrics suggest that the topics have experienced tremendous growth in interest. In fact, the stability is even less impressive when we notice that it’s a result of a cancellation between slight declines in views of established pages in the genre, and traffic going to new pages.
The data for philanthropic foundations demonstrates a fairly slow and steady growth (about 5% a year), partly due to the creation of new pages. This 5% hides a lot of variation between individual pages: