A critique of the Economist's "#AfricaTweets" story
The latest edition of the Economist contains an article titled “#AfricaTweets.” The piece contains a striking map that visualizes the “number of tweets” per country in the “top 20 African countries.”
The only problem is that the article doesn’t do what it promises.
My problem with the Economist’s article isn’t their whimsical (and quite funny) commentary on the use of Twitter in Africa (e.g. they quote @MorganTsvangirai “******* **** ******* ****** ******** ****** ** ******* #ZimPolitics” and @Bono” Africans tweeting each other, not me, about news, not me #sadface”).
The issue is that the Economist makes no attempt whatsoever at qualifying the limitations of these data.
For instance, the article begins with the statement that “Twenty countries sent over 11m tweets in the last quarter of 2011.” I believe this to be a vast underestimation of the amount of information pushed through the platform in Africa.
Looking at the source document for the Economist’s data (something they neglect to link to), we see that the data naturally contain only geo-located Tweets (something they neglect to mention). This is important because only a very small proportion of tweets tend to contain any geodata. In June 2011, my team and I collected 19.6 billion tweets using the statuses/sample stream with spritzer access (this was a 19 day sample collecting both geocoded and non-geocoded tweets globally), and we found that only 0.7% of tweets contained geographic coordinates.
Original map created by Portland Communications above
This matters because it is conceivable that people in some countries are more likely to geolocate their tweets than others due to either social norms or access to the requisite devices (such as smartphones). In other words, by looking at geocoded tweets we’re only seeing a tiny fraction of the content that passes through the platform.
This isn’t to say that there aren’t other ways to geolocate information on Twitter. In a recent paper, Scott Hale, Devin Gaffney and I recently analysed whether locations in user profiles (descriptions such as “Oxford, UK” or “Barad-dûr, Mordor, Middle Earth” that can be grabbed from the vast majority of tweets) can be used as a proxy for (much rarer) geocoded content. It turns out that profile information isn’t a great substitute for actual latitude and longitude coordinates.
Time zone settings are another approach to figuring out where information comes from, but our research shows that many users (especially within Africa) don’t seem to set their time zone.
Furthermore, there is no attempt to account for prolific users in these samples. Looking at the Economist’s map (and even the source document) doesn’t tell us if Gabon is in the top-20 list because a lot of people use the service in that country, or a small number of Twitter addicts all have their smartphone GPS buttons turned on. Knowing the answer to this question fundamentally changes how we should interpret the map.
None of this means that the original maps produced by Portland Communications are fundamentally flawed. Geocoded tweets are both insightful and useful. It just shows that there are crucially important details to be aware of whenever analysing Twitter data (I won’t even get started on the different types of sampling methods in this post).
Hundreds of millions of short messages are passed through Twitter every day, and this content has been used by researchers from fields as diverse as epidemiology, politics, marketing and geography to better understand, map and measure large-scale social, economic, and political trends and patterns. However, much of this analysis is carried out with only limited understandings of how best to work with the spatial and linguistic contexts in which that information was produced.
Maps are powerful tools: they influence how we understand, enact, produce, and re-produce our world. This means that cartographers bear a significant amount of public responsibility.
And any geographer will tell you that no map is true representation of anything. With the advent of easy-to-access Internet-based data, we therefore need to more than ever constantly ask critical questions about how online data are collected, analysed, and presented.