Internet Geographer


Mapping Arabic Wikipedia

As part of an IDRC-funded multi-year project to understand local knowledge production on Wikipedia in the Middle East and North Africa, we plan to release our initial results on this blog.

In the project, we ask three key questions about Wikipedia in the region:

1) What is the geography of articles in the Middle East and North Africa, and how does this compare to the rest of the world? (we are also asking similar questions within the contexts of East Africa. This might mean that we occasionally mix some of our data from the two regions (as we do in the maps below)).

2) Do local authors in the region comprise disproportionally fewer of the contributions to articles about the region?

3) Are the contributions of local contributors undervalued?

The two maps below are the first in our series and depict the total number of Arabic articles in Wikipedia throughout the region as well as the number of Arabic articles per square kilometre (actually every 1000 km2).

The data were derived from the Wikimedia Foundation’s regular XML dumps of the Arabic, Egyptian Arabic, English, French and Hebrew Wikipedias in March 2011.  The article source was analysed coordinate templates or recognisable coordinate parameters in other templates, such as “Infobox settlement.” In cases where this method didn’t reveal any coordinates, we then used interwiki links to obtain coordinates from other language versions of the same article. This gave us a much more useful set of points, particularly for the smaller wikis.

Once this was done, all parameter values were converted to a common format.  Our dataset still contained some coordinates that didn’t make much sense for us to keep, notably coordinates of features on the moon and other planets, so we then had to make sure all non-Earthly articles were deleted from the dataset.

The maps above are then the result of counting the number of articles in the top-level subdivision in each of our areas of interest.

When looking at total counts (the top map), you can see that it is Israel/Palestine and parts of the Arabian Peninsula that tend to have the highest counts. However, to get a better sense of the density of layers of information over any given place, it is more useful to look at the number of articles per square kilometre. This is what the second map does.

Here you see that the densest layers of information in Arabic are again over Israel and Palestine. Much of the Mediterranean coast in Morocco, Tunisia, and Algeria as well as the Nile valley and parts of the UAE also have relatively dense clouds of content about them.

Obviously not all of these places are home to native Arabic speakers, and one of the stories we want to tell in future posts is how the geolinguistic contours of Wikipedia differ over different parts of the region.

We also aim to more closely examine the factors that might explain these uneven geographies of content. Is it internet access? GDP? Education levels? These data will be supplemented by in-depth focus groups that we aim to hold in Egypt and Jordan next year.

These initial mappings provide us with many more questions than answers, but this only means we have much to do over the next few months.

Feel free to comment with any questions or observations.