Internet Geographer

Blog

Posts tagged english
Adieu French: comparing English and French Wikipedias
The English and French Wikipedias are the world’s first and third largest versions of the encyclopedia (containing 3.9 and 1.3 million articles respectively).   

I thought that it might be instructive to compare the geographic coverage of the two. Even though there is three times as much content in English than French, one might assume that there are plenty of parts of the world in which people are more likely to annotate or augment space with French content. 

The results are contained in the map below:

We ultimately see only a few countries in which there is more French content: France (of course), Belgium, Luxembourg, the Francophone parts of the Maghreb (Algeria, Morocco, and Tunisia), the DRC, Senegal, and surprisingly Bosnia, Montenegro, and Kosovo.

You would expect the first eight countries on the list to have more French content than English, but there seems to be no obvious reason why Bosnia, Montenegro, and Kosovo have more French-language information about them. Then again, there is not necessarily a reason why there should be more English-language content in every other country in which neither French nor English is the primary language spoken.

Also interesting is that much of the rest of the Francophone world has more English-language content then French. Madagascar, Haiti, Cameroon, Mali etc. all have more written about them in English than French.

What does this map tell us? We know that the number of Wikipedia articles about a place isn’t necessarily a great proxy for broader social or cultural relationships and patterns (e.g. the example of the heavy focus on Turkey in the Swahili Wikipedia). But perhaps these patterns of attention do still tell us something about the importance of English vs. French in some of these places. Rwanda, for instance, has more English-language content: a fact that reflects the country’s shift into the Anglophone sphere.

Perhaps in much of the rest of the Francophone world we are also seeing a similar (although likely less-pronounced) shift towards use of English as a means of non-local communication and local representation to a broader audience.

I’d welcome any further thoughts or questions….

(for more information about this work, have a look at the other blog posts I’ve written about the geographies of Wikipedia)
Article Quality in English Wikipedia
Expanding on the maps of Wikipedia quality (i.e. the maps of South Asia and the Swahili version of Wikipedia) posted earlier on this blog, I want to offer a visualisation of all articles on the planet shaded according to the number of words in each article. In the map below, yellow dots represent the location of relatively short articles (such as the “Jericho Tavern”) in the English version of Wikipedia, while red dots indicate the location of relatively long articles (for instance, “Penzance”). A high-res version is also available here (I highly recommend downloading it and exploring in some detail).


Interesting patterns emerge: the average word length of articles in the US is 750, while many European countries have lower means: e.g. Italy (550), Germany (439), Spain (397), France (260), and Poland (233). But it is also noteworthy that a few European countries do have means more similar to the US. Articles in the UK and Ireland they average 687 and 749 words respectively. The immediate conclusion here should be that it is easier for editors in English speaking countries (all of which tend to have high averages) to expand articles than editors in countries in which English isn’t the native language. 

But the native language of a country clearly isn’t the only factor at play. The countries with the highest average word counts to their articles are (this list excludes small islands and city states): Iraq with an average of 1091 words in its 538 articles, the Philippines with an average of 1085 words in 2736 articles, and North Korea with an average of 947 words in its 292 articles. 

That’s right: out of a list of over 200 countries, North Korea has one of the highest average word counts for its Wikipedia articles!

On the bottom end of the scale we have Azerbaijan (159), Estonia (209), and Kenya (223). 

The results tell us that there are apparently a lot of stub articles written about Azerbaijan, Estonia and Kenya (e.g. the Bukhungu stadium). Whereas there are very few stubs in places like Iraq and North Korea: a finding that makes a lot of sense. It must be very hard for English speaking editors to create articles (even stub articles) about things like small stadiums in provincial towns in North Korea and Iraq. But uploading this sort of information about the equivalent type of place in Estonia or Kenya is far less of a problem. 

There are clearly a lot of (locally-specific) factors at play here that will explain some of the patterns that we are seeing, and we are looking at how a range of metric (e.g. literacy, computer access etc.) correlate to these data. In the meantime, any thoughts or comments are welcome in the comments field below.

More regional maps will also be up on the blog soon…