America's most influential cities: the urban geography of klout scores
My colleague Devin Gaffney and I decided to dig deeper into the geography of Klout and examine the geography of some of the largest cities in the US. We found some very interesting patterns and large differences in the average influence of users in American cities.
Klout scores, for those unfamiliar with them, fall between 0 and 100 and supposedly measure influence (higher scores indicating that a person is more influential). As, I’ve noted before, this sort of quantification of a person’s influence based on online activity is inherently problematic. It defines influence rather narrowly and then ranks each person with a highly decontextualised score that is unlikely to account for the many nuanced ways that influence is perceived and enacted. However, despite the problematic nature of the service, it is nonetheless important to attempt to better understand how it is measuring and representing people.
We therefore decided to calculate the average Klout score of 49 of the largest American cities. The map below displays each city as a circle that is shaded and sized according to its Klout score. In the interest of clarity, only the top-ten and bottom four cities are labelled.
First, a few words on how we collected the data: From April 8th to April 29th, 2012, approximately 195 million tweets were collected via Twitter’s “spritzer” access level. Geo-coded tweets were selected using the API’s internal methods. The resulting dataset was then cross-referenced against a list of fifty bounding boxes approximating the general conurbation of every city and its suburbs (so as to capture the full scope of the metropolitan area at large). For each resultant bounded set, 1,000 random users were selected from the city and referenced against Klout’s score API. For each city, slightly less than 1,000 users are shown, as some of the tweeting users have not been detected and scored by Klout, and as a result have no score.
The city with the best average influence score (29.1) for its users is San Francisco (which perhaps unsurprisingly is also the headquarters of Klout). San Francisco’s average score is also interestingly significantly higher than the city with the second-highest average (Austin at 27.8). We then see a tighter cluster of average city scores for Seattle in third place (27.1), and two more Bay Area cities in fourth and fifth: Oakland (27.1) and San Jose (26.8).
At the bottom end of the scale we have Houston (23.3), Jacksonville (22.9), Memphis (22.8), and Virginia Beach (22.7).
Why do we see such variance in the geography of Klout scores? Are people in San Francisco and Austin really that much more influential than people in Houston or Memphis? Klout scores certainly aren’t (well, at least they don’t appear to be) randomly assigned. They are derived by combining score of number of followers, number of people you follow, number of (and spread of) retweets etc.
But does the geography of Klout actually tell us anything useful about these cities? By themselves, I think these data tell us almost nothing. They are a very blunt and fuzzy tool applied to a limited sample and we should be hesitant about reading too much into the numbers. However, when brought together with other data and research about information production and consumption, influence, and voice they potentially allow us to us to draw more rounded pictures about the sub-national geographies of the internet.
One interesting point is the discrepancy between these city-level scores as compared to the national scores conducted in an earlier study. While no conclusive reason has been found for this discrepancy, a few possibilities may create this effect. One theory may be that the users sampled for this report were collected on twitter in April 2012 - many of them may have since decreased the usage of their accounts, and as a result the scores may have decreased. Another theory is that there may be some correlation with users located outside of population centers having higher scores. Despite this, the data being shown was exhaustively assessed in order to determine the extent to which this discrepancy could have been in error, and has found to be accurate.