Internet Geographer

Blog

dominant Wikipedia language by country

I devote a lot of energy to writing about the layers of information that augment our world and why they matter. Some of this work has further explored how not just the quantity or thickness of layers of information matter, but also their audiencing. In other words, the ability to access, read, and make sense of the layers of content that annote our world. 

One way to measure this audiencing of content is to look at the variable amount of information in different languages in Wikipedia. In other words, where do we see the densest layers of content in English vs. French vs. Arabic etc.? The maps below illustrate this information: showing the language with the most content per country. 

A quick note about method. Our geographic data were obtained by combining geotagged data from two independent sources, WikiLocation and Georeferenzierung. These tools enable us to capture data in 44 languages comprising 87 percent of all articles across all Wikipedias and articles from all Wikipedias with more than 100,000 articles. 


By displaying only the largest Wikipedia language in each country, these maps undoubtedly lose a lot of useful detail. However, the maps are still insightful in a few ways.

They interestingly reveal that European languages are dominant even when it comes to annotating countries outside of Europe. Only China and Taiwan (Chinese), Japan (Japanese), South Korea (Korean), Vietnam (Vietnamese), and Syria (Arabic) are the exceptions. The largest Wikipedia languages blanketing absolutely everywhere else in the world are European.

English is dominant in much of Africa, the Middle East, South and East Asia, and even parts of South and Central America. We then see French in five countries in Africa (other traditionally Francophone countries like the Ivory Coast still have more content in English). German is dominant in one former German colony (Namibia) and a few other countries scattered around the world (e.g. Uruguay, East Timor). 

There are also a few European countries covered by a larger amount of content in a non-native language than a native language. A lot of English content exists about eastern and southeastern Europe. There is more French content about much of the former Yugoslavia than there is content in local languages. We also see Russian as a being a dominant language in a number of neighbouring countries (Georgia, Kazakhstan, Belarus, and Ukraine). 

In Spain, we also have one case of a ‘minority language’ (Catalan) with significantly more content (about thirty-five thousand articles) than the 'majority language’ (Castilian/Spanish with about nineteen thousand articles) of the country. Nowhere else in the world do we see such high-visibility for a relatively small language. 

More broadly, what do these maps tell us? They certainly reinforce some of what we know about the dominance of English as a language in which people want to represent things, places, and events of note. But they also flag up the need for deeper research into issues of power and representation on/in Wikipedia. In other words, the near absence of Arabic, Swahili, Hindi, Bengali, and many other large African and Asian languages means that we need sustained new inquiry into old questions about power and representation.