New Article published - Beyond the geotag: situating 'big data' and leveraging the potential of the geoweb

An article that I worked on with Jeremy Crampton, Ate Poorthuis, Taylor Shelton, Monica Stephens, Matt Wilson, and Matt Zook – Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb – has just been published in Cartography and Geographic Information Science as part of a special issue on “Mapping Cyberspace and Social Media.”

The abstract and full citation for the paper are below:
This article presents an overview and initial results of a geoweb analysis designed to provide the foundation for a continued discussion of the potential impacts of ‘big data’ for the practice of critical human geography. While Haklay’s (2012) observation that social media content is generated by a small number of ‘outliers’ is correct, we explore alternative methods and conceptual frameworks that might allow for one to overcome the limitations of previous analyses of user-generated geographic information. Though more illustrative than explanatory, the results of our analysis suggest a cautious approach toward the use of the geoweb and big data that are as mindful of their shortcomings as their potential.

More specifically, we propose five extensions to the typical practice of mapping georeferenced data that we call going ‘beyond the geotag’: (1) going beyond social media that is explicitly geographic; (2) going beyond spatialities of the ‘here and now’; (3) going beyond the proximate; (4) going beyond the human to data produced by bots and automated systems, and (5) going beyond the geoweb itself, by leveraging these sources against ancillary data, such as news reports and census data. We see these extensions of existing methodologies as providing the potential for overcoming existing limitations on the analysis of the geoweb.

The principal case study focuses on the widely reported riots following the University of Kentucky men’s basketball team’s victory in the 2012 NCAA championship and its manifestation within the geoweb. Drawing upon a database of archived Twitter activity – including all geotagged tweets since December 2011–we analyze the geography of tweets that used a specific hashtag (#LexingtonPoliceScanner) in order to demonstrate the potential application of our methodological and conceptual program. By tracking the social, spatial, and temporal diffusion of this hashtag, we show how large databases of such spatially referenced internet content can be used in a more systematic way for critical social and spatial analysis.
Crampton, J.W., M. Graham, A. Poorthuis, T. Shelton, M. Stephens, M.W. Wilson and M. Zook. 2013. Beyond the Geotag: Situating ‘Big Data’ and Leveraging the Potential of the Geoweb. Cartography and Geographic Information Science 40(2): 130-139.

Or you can freely access a pre-publication version from SSRN:
Wikipedia in the UK
After a lot of data cleaning and number crunching, here are three maps of the geographies of Wikipedia in the UK using brand new November 2010 data. Looking at the first map (total number of articles in each district), we see some interesting patterns. With a few exceptions, it is rural districts in Scotland, Wales and the North of England that are characterised by the highest density of articles.

What we’re likely picking up on is that fact that large districts simply have more potential stuff to write about. If we normalise the map by area we see an entirely different pattern. The map below displays the number of articles per square KM.

We see that most of the large urban conurbations in the UK are covered by a dense layer of articles. Most sparsely populated areas in contrast have a much thinner layer of virtual representation in Wikipedia. There are, however, some notable exceptions. Parts of Cornwall, Somerset and the Isle of Wight all have a denser layer of content than might be expected for such relatively rural parts of the country. One might expect a higher density in the districts surrounding Belfast (in fact almost all of Northern Ireland is characterised by very low levels of content per square KM).

Finally, we can look a the number of articles per person in each district:

Here some more surprising results are visible. All major urban areas have relatively low counts of article per person (with the exception of central London). In contrast, many rural areas (particularly areas containing national parks) have high counts per person.

There are obviously a range of ways to measure the geographies of Wikipedia in the UK. We see that some areas are blanketed by a highly dense layer of virtual content (e.g. central London and many of the UK’s other major conurbations). These maps also highlight the fact that some parts of the UK are characterised by a paucity of content irrespective of the ways in which the data are normalised. Northern Ireland in particular stands out in this respect.

I’ll attempt to upload similar analyses of other countries in the next few months. In the meantime, however, please offer thoughts on these maps either below or on the cross post at the Floatingsheep blog.

p.s. many thanks to Adham Tamer for his help with the data extraction.