New paper: Where in the World are You? Geolocation and Language Identification in Twitter
Scott Hale, Devin Gaffney and I have a forthcoming paper in The Professional Geographer on the geography of Twitter.
The abstract is below, and you can download the pre-publication version from the link at the end of this post.
The movements of ideas and content between locations and languages are unquestionably crucial concerns to researchers of the information age, and Twitter has emerged as a central, global platform on which hundreds of millions of people share knowledge and information. A variety of research has attempted to harvest locational and linguistic metadata from tweets in order to understand important questions related to the 300 million tweets that flow through the platform each day. However, much of this work is carried out with only limited understandings of how best to work with the spatial and linguistic contexts in which the information was produced. Furthermore, standard, well-accepted practices have yet to emerge. As such, this paper studies the reliability of key methods used to determine language and location of content in Twitter. It compares three automated language identification algorithms to Twitter’s user language interface setting and to a human coding of languages, and identifies common sources of disagreement. The paper also demonstrates that in many cases user-entered profile locations differ from the physical locations users are actually tweeting from. As such, these open-ended, and user-generated, profile locations cannot be used as useful proxies for the physical locations from which information is published to Twitter.
Graham, M, S. Hale, and D. Gaffney. 2013. Where in the World are You? Geolocation and Language Identification in Twitter. The Professional Geographer (in press).