Skip to content

Latest commit

 

History

History
83 lines (55 loc) · 4.55 KB

twitter.md

File metadata and controls

83 lines (55 loc) · 4.55 KB

Mining for tweets

Getting access to twitter data

Sign in with your twitter account at https://developer.twitter.com/. Click on "Create New App". I made mine USF-parrt-teaching. Click on that new app when it appears, then the "Keys and Access Tokens" tab. You should find and copy into a secure file on your laptop the consumer_key, consumer_secret, access_token, access_token_secret. I store them in a one-line CSV file for convenient use in apps. We never want to expose these by putting into source code literally. More info in the sentiment project description.

Never store your API key in your code.

Creating an account means answering some questions. First try add basic profile info to your account and follow a few people, such as me: @the_antlr_guy. As of 2019, one student had trouble getting account verified until he answered questions as follows:

  • Q1: I attend the University of San Francisco. I am pursuing a master's in data science. The course I require Twitter data for is MSDS 692-Data Acquisition; the instructor for this course is Terence Parr. I intend to use Twitter data to complete a Sentiment analysis project for this course.
  • Q2: The tweets selected will be analyzed using the vaderSentiment library in python. The tweets will be colored on a scale from green to red depending on how the tweet's sentiment.
  • Q3: No
  • Q4: After completing the sentiment analysis, different tweets will be colored depending on how they are viewed. These tweets will be displayed on a server through AWS; the professor and the grader are the only people who will have the address.
  • Q5: No

Tweet feeds

Twitter provides URLs that allow us to search recent tweets but it's fairly inconvenient and so we use library called tweepy. You will likely have to install that.

It appears Twitter's Search API searches recent tweets from the past 7-9 days and the search is rate-limited to 180 queries per 15 minute window.

Here is a simple example from the tweepy getting started documentation that pulls from the public home timeline:

import tweepy

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

That assumes that you have set variables consumer_key, consumer_secret, access_token, access_token_secret. To do that, I have it read directly from my secret file:

def loadkeys(filename):
    """"
    load keys/tokens from CSV file with form
    consumer_key, consumer_secret, access_token, access_token_secret
    """
    with open(filename) as f:
        items = f.readline().strip().split(', ')
        return items

consumer_key, consumer_secret, \
access_token, access_token_secret \
    = loadkeys("/Users/parrt/licenses/twitter.csv")
...

Here are is the API reference for tweepy. For example, here is how to get an object describing Donald Trump:

user = api.get_user('realDonaldTrump')

To print the number of followers, you can reference user.followers_count.

Exercise: Get the code from above working to show the public tweets and then pick a Twitter user and print their followers count and other details you find from the API reference.

Cursors

For very large results, we need to use cursors that handle getting page 1, page 2, etc... of tweets. Here's how to get the first 100 tweets for Donald Trump:

for status in tweepy.Cursor(api.user_timeline, id='realDonaldTrump').items(100):
    print(status)

Exercise: Pick another user and print out there most recent tweets.

Streaming API

From tweepy doc: "The Twitter streaming API is used to download twitter messages in real time. It is useful for obtaining a high volume of tweets, or for creating a live feed using a site stream or user stream."

Stephen Hsu, MSAN2017, sent me this nice snippet to listen in on the twitter feed.

Here is a nice tutorial on the streaming API.