Internet Geographer

Blog

Posts tagged user generated content
Controversy in Wikipedia in Africa

One more post on controversy before I close down the map-making machine. Following from the maps of controversy that my colleagues (Taha YasseriAnselm Spoerri, and János Kertész) and I made about Wikipedia in the British Isles and Australia, we have produced a map of controversy in Africa (for those interested in the method used to derive the data, check out the original post on the topic: Mapping Controversy in Wikipedia).



Here we see some notable patterns of controversy. Egypt (which hosts three out of the top-five most controversial articles on the continent) and North Africa have a lot of contentious articles. So too does the Horn of Africa (more than heavily populated parts of West Africa). 

Some of the most controversial articles (e.g. the Great Pyramid of Giza) seem to rise to the top of the list simply because there is a lot written about them and the Wikipedia article then becomes a site for discussion and conflict. But in other places/cases, we see more violent, material, and political conflicts spilling over onto the talk pages of articles (e.g. the Somaliland article). 

It is interesting to point out the average controversy score on the continent (96) is quite similar to the average in the British Isles (110). In other words, in both places, most articles simply aren’t controversial at all and we see a pronounced long-tail effect with only a few articles subject to the brunt of argument and conflict. 

But, we do see a lot more total conflict in the British Isles. Given that there are also far more articles about the British Isles than all of Africa combined, the higher total amount of controversy is simply a reflection of the increased amount of human labour and attention focused on the region.
Controversy in Wikipedia in Australia

We’ve had a lot of interest in our maps of controversy in Wikipedia. However, one problem is that none of those maps allow us to drill down into the geography of conflict in any particular country.

As such, I’d like to start sharing a few more country specific maps of controversy; starting with Australia:


The top-10 most controversial articles is Australia are:

1) Australia (not mapped as a point)
2) Sydney
3) Newington College
4) Dapto High School
5) Melbourne
6) Sydney Airport
7) Anglican Church Grammar School
8) Brisbane
9) Melbourne Airport
10) 2005 Cronulla Riots

Some of these entries are likely high-traffic articles and it makes sense that they might be characterised by a lot of controversy. But why are so many schools listed in the top-10? My suspicion is that we’re seeing the effects of vandalism by a few students who might want to have voice in the digital representations of the institution that they spend most of their day confined in.

In other words, this map of Australia shows us that controversy in user-generated content can arise from not just about the usual/expected topics of politics and religion, but also more mundane issues like school vandalism.

Read more about this work here:

Yasseri, Taha, Spoerri, Anselm, Graham, Mark and Kertesz, Janos, (2014) The Most Controversial Topics in Wikipedia: A Multilingual and Geographical Analysis. In: Fichman P., Hara N., editors, Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press. Available at SSRN.
Mapping Controversy in Wikipedia

Wikipedia, the collection of 37 million articles that anyone can edit, is defined by conflict. The ability for anyone to shape this global repository of knowledge inevitable means that we are presented with fascinating, shocking, and often hilarious discussions on the talk pages of articles. Just check out the talk pages of articles about Barack Obama, the Persian Gulf, and Freddie Mercury (or, if you really want to waste an afternoon, dive into Wikipedia’s collection of ’lamest edit wars’).  

So, a natural question for my colleagues (Taha Yasseri, Anselm Spoerri, and János Kertész) and I was whether we can model and map the controversiality of Wikipedia articles. Does controversy have distinct geographies? It turns out that it does.


To quantify the controversiality of an article based on its editorial history, we focused on “reverts”, i.e. when an editor undoes another editor’s edit completely. We counted all of the reverts in the history of every article and gave a higher weight to editors that revert each other repeatedly. To validate everything, we measured the classifier against human judgement. If you want to read more about the method check our articles here or here

This all allowed us to get a sense of what the most controversial articles in each Wikipedia language editions are.  In English, the most controversial article is George W. Bush, followed by Anarchism, followed by Muhammed. Whereas in French, the top-three most controversial articles are Ségolène RoyalUFOs, and Jehovah’s Witnesses (we’re certain there are some good jokes hiding in the orders of these lists). For the full list of top-10 controversial articles in ten languages, check out our in press chapter on the topic (or look at the complete lists here and an interactive visualisation of Wikipedia conflicts at this link). But the short version is that at the top of the lists in multiple languages we see articles related to religion, politics, and football; i.e. pretty much exactly what you would expect people to be arguing about.

But what about the geography of these controversial articles in different languages? Where do we see the most controversial articles in different languages? Below is the full list of maps that we created:















What do these maps tell us? First, we see an interesting amount of difference between the various language editions of Wikipedia. Some of the smaller Wikipedias have a high-degree of self-focus in articles that are characterized by the greatest degree of conflict (check out some of Brent Hecht’s work for more on this). For instance, we see articles with the highest amount of conflict in the Czech and Hebrew Wikipedias being about the Czech Republic and Israel respectively. 

Even when looking at large languages that are primarily spoken in more than one country, we are able to see that a significant amount of self-focus occurs (look at the Arabic and Spanish maps of conflict for examples of this). 

The interesting exception to this rule is the Middle East. All languages in our sample apart from Hungarian, Romanian, Japanese, and Chinese actually include articles in Israel as some of those characterised by a large amount of conflict. 

Also, worth pointing out is the fact that we see significant differences in the geographic topics that generate the most conflict. The articles in Japanese that generate the most conflict are not only all located in Japan (and are all educational institutions). The Portuguese articles that generate the most conflict are similarly all located in Brasil (the world’s largest Portuguese-speaking nation), with four out of the top five conflict scores being about football teams. 

Within our sample, we actually only see the English, German, and French Wikipedias with a significant amount of diversity in the topics and patterns of conflict in geographic articles. This probably indicates the less significant role that specific editors and arguments play in these larger encyclopaedias. 

Ultimately by visualizing the geography of conflict in Wikipedia, we’re able to see both topics that appear to have cross-linguistic resonance (e.g. Arab-Israeli conflict), and those of more narrow interest such as the Islas Malvinas/Falkland islands article in the Spanish Wikipedia.

These maps therefore offer a window into not just the topics that different language communities are interested in, but also the topics that seem worth fighting about.



To read more about conflict and Wikipedia:


Yasseri, Taha, Spoerri, Anselm, Graham, Mark and Kertesz, Janos, (2014) The Most Controversial Topics in Wikipedia: A Multilingual and Geographical Analysis. In: Fichman P., Hara N., editors, Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press. Available at SSRN.

Graham, M., M. Zook., and A. Boulton. 2012. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. DOI: 10.1111/j.1475-5661.2012.00539.x

Graham, Mark, The Virtual Dimension (2013). Global City Challenges: Debating a Concept, Improving the Practice, M. Acuto and W. Steele. Available at SSRN: http://ssrn.com/abstract=2212824

Yasseri, T., Sumi, R., Rung, A., Kornai, A., and Kertész, J. (2012) Dynamics of conflicts in Wikipedia. PLoS ONE 7(6): e38869.

Török, J., Iñiguez, G., Yasseri, T., San Miguel, M., Kaski, K., and Kertész, J. (2013) Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment. Physical Review Letters 110 (8).
What percentage of edits to English-language Wikipedia articles are from local people?

As part of our on-going efforts to explore the geographies of participation in Wikipedia, we have calculated the percentage of local edits to articles about places. In other words, this map illustrates the percentage of edits about any country that come from people with strong associations to that country.

For more on the method that we employed, have a read through the post on “who edits Wikipedia” - in which I explained our data collection efforts in much detail. The data are undoubtedly somewhat imprecise, but we are confident that they offer us the best overview of the geography of authorship that can be obtained with publicly-available data.

What do these results tell us?

Unsurprisingly, they show that in predominantly English-speaking countries most edits tend to be local. That is, we see that most Wikipedia articles (85%) about the US tend to be written from America, and most articles about the UK are likewise written from the UK (78%). The Philippines (68%) and India (65%) score well in this regard - likely because of role that English plays as an official language in both countries. But why then do we see relatively low numbers is other countries that also have English as an official language, such as Nigeria (16%) or Kenya (9%)?

We also, interestingly, see relatively high local edit percentages from a handful of countries that don’t count English as an official language: Finland (50%), Norway (56%), Romania (54%), and Bulgaria (53%).

Then we also observe large parts of the world in which very few English-language descriptions about local places are created about local people. Almost all of Sub-Saharan Africa falls into this category.

The key question is whether these data actually tell us anything meaningful. For instance, just because most edits about the United States likely come from the United States does not necessarily mean that those articles are representative, include a diversity of viewpoints, or fail to exclude people, places, and processes.

But the data nonetheless, in a very broad way, do tell a story about voice and representation. Some parts of the world are represented on one of the world’s most-used websites predominantly by local people, while others are almost exclusively created by foreigners - something to bear in mind next time you read a Wikipedia article.