Digital Geography

17. May 2015

The Geography of Tweets: Reading Tweets with QGIS

Anita showed some nice examples of tweets in QGIS in 2012. Since then it seemed to be quiet about the twitter-content in QGIS. Yet tweets can be an interesting source of information. Sometimes they can tell you something about the spatiotemporal dimensions regarding a keyword, the digital heartbeat of a defined region and many more. Yet we need to be careful with the data as it is completely biased. But how to get this data stream into QGIS?

The First Insights: Tweets in QGIS

As in 2011/2012 the twitter API was easy to fetch with a simple line of code and you were able to stream data. Anita showed this in a nice little way:
curl -k -d @locations.txt https://stream.twitter.com/1/statuses/filter.json -uuser:password > tweets.json
With this line you collected all tweets in a file for a defined region (as stated in locations.txt) and you can then simply scan the file in QGIS for tweets with a defined location, add those lines as features in a point shapefile and of you go. But soon after, Twitter changed its API policy and switched to a more advanced authentication system. The solutions of Anita wasn’t working any more. Still nice examples from tweets in a geographic context appeared online. Most extreme map is probably this here: And also the recent versions of ArcGIS added the possibility to add tweets to ArcGIS map products . So it’s time to add it to QGIS as well.

Tweets in QGIS- Prerequisites

At the moment the most reliable way to get access to the Twitter API from a python environment seems to be tweepy. The installation is quite easy under Ubuntu. But for an installation in Windows you need some information about environment parameters and only the user knows whether he has the OS4geow shell installed. Long story short: You need to install the tweepy module by hand prior reading streams from twitter. Furthermore the twitter API needs some tokens and keys: Make sure to have a twitter account and create an application so you’ll get your codes:
  • access token
  • access token secret
  • consumer key
  • consumer key secret
Ones you have both let’s dive into the QGIS python console.

Gathering Tweets

The first part of gathering tweets is to call tweepy with a defined location:

import tweepy
access_token = "put your token here"
access_token_secret = "put your token  secret here"
consumer_key = "put your key here"
consumer_secret = "put your key secret here"
key = tweepy.OAuthHandler(consumer_key, consumer_secret)
key.set_access_token(access_token, access_token_secret)

# here come the tweepy part:

class stream2lib(tweepy.StreamListener):
output = {}
def __init__(self, api=None):
api = tweepy.API(key)
self.api = api or API()
self.n = 0 #we will start with zero tweets
self.m = 10 #let's stop with 10 tweets
def on_status(self, status):
#we will parse the interesting information into a nice format
self.output[status.id] = {
'tweet':status.text.encode('utf8'), #text could have non utf8 characters, so change this!
'user':status.user.screen_name.encode('utf8'), #user name should be utf8 conform as well
'geo':status.geo, #this is the point location of the device
'localization':status.user.location, #user location as part of the user profile (normally set fixed per user)
'time_zone':status.user.time_zone, #quite
'time':status.timestamp_ms} #the timestamp given in ms since 01.01.1970
#we will only care about tweets with geo
if self.output[status.id]['geo']!=None:
self.n = self.n+1 #we found a geotweet. but that's always true when calling the command with "locations=[-x,-y,x,y]" as below
if self.n < self.m:
return True
else:
return False
stream = tweepy.streaming.Stream(key, stream2lib()) #initiate the stream
stream.filter(locations=[-180,-90,180,90]) #filter the stream for tweets in this "box"
tweetdic = stream2lib().output #copy it in a variable
print tweetdic #just to be sure 😉

As you can see: we gather tweets and we filter the stream. Unfortunately the filtered stream can’t be stored in a variable so we added the output to the whole listener and we need to filter this for tweets with a coordinate afterwards. With the lines above we have a dictionary of tweets….

Adding Tweets as Points to QGIS

As we have the tweets in a variable we can simply iterate over the dictionary and fill a virtual layer in QGIS with the point information. First we create this virtual layer:

vl = QgsVectorLayer("Point", "temporary_twitter_results", "memory")
pr = vl.dataProvider()

# changes are only possible when editing the layer
vl.startEditing()
At the moment the layer doesn’t contains any attributes so let’s add them as well:
pr.addAttributes([QgsField("user_name", QVariant.String),QgsField("localization", QVariant.String), QgsField("tweet", QVariant.String), QgsField("time", QVariant.String)])
And the next lines will iterate over the dictionary, uses the coordinates as point locations and tweet attributes as attributes for each feature if a tweet has a coordinate:
for tweet in tweetdic:
    if tweetdic['geo'] != None:
        fet = QgsFeature() #it's a new feature
        fet.setGeometry(QgsGeometry.fromPoint(QgsPoint(tweetdic['geo']['coordinates'][1],tweetdic['geo']['coordinates'][0] ))) #use the coordinates for point location
        tweettime = datetime.datetime.utcfromtimestamp(float(tweetdic['time'][:-3] + "." + tweetdic['time'][11:13])).strftime('%Y-%m-%d %H:%M:%S:%f') #parse the time to fit YYYY-MM-DD HH:MM:SS:MS
        fet.setAttributes([tweetdic['user'],tweetdic['localization'],tweetdic['tweet'],tweettime]) #set attributes of current tweet at current location
        pr.addFeatures([fet]) #and add the feature to the layer.
And as we have finished the iteration let’s stop the editing and publish the layer to the current QGIS project:
# commit to stop editing the layer
vl.commitChanges()
# update layer's extent when new features have been added
# because change of extent in provider is not propagated to the layer
vl.updateExtents()
QgsMapLayerRegistry.instance().addMapLayer(vl)
In the end you can collect as many tweets you would like. But be warned this solution might freeze your QGIS application for a few moments until a new tweet was found.

10’000 tweets in 3min 😉

Together with Anita’s TimeManager plugin you can now create nice videos: The whole script can be downloaded: tester2. Furthermore I created a QGIS plugin called twitter2qgis or geotweet:

twitter2qgis/geotweet for qgis: a plugin

The plugin is also under development via github. so please report any issues or contribute with knowledge/programming…

A Warning

You can collect a large number of tweets and also mine them. But be aware: The user who uses twitter is quite specific. The user who also allows Twitter to use current location is even more specific. If you do any analysis with those tweets: keep in mind these aspects and also think about reading these articles:  
  • Wow! very good, THX, but it isn’t very recommended publish the token. Do you agree?

    • damn it. forgot to delete it. I’ve updted the post and depublished the apps with the desired keys …

  • Ross Wardrup

    Ah! You’re the twitter2qgis dev! I’ve been playing with the tool quite a bit over the past couple of days. I’m very excited about it as a data collection tool. Thanks for the work!