GitHub - BogdanOtava/Twitter-Scraping: Scraper for retrieving insights into sentiments, opinions & social media trends.

About The Project

This project was developed as my portofolio project for attending and graduating a five month long Python Development course.

Twitter is one of the best social media platforms to scrape data from. It's API is quite permissive, easy to work with, and of course, free. The API enables access to core elements of Twitter, such as getting tweets or users information, tweeting from outsite Twitter, or even making bots.

What I wanted to accomplish with this project, was to scrape tweets in Python and get, in an easy to read way, information such as word count, most liked tweets, source of the tweets or AI sentiment.

Getting Started

For the program to run smoothly, there are a few things needed.

External dependencies & services

You will firstly need make an account or login with your Twitter account to the Developer Platform. Make a new APP and generate the api keys and access tokens. In the working directory, create a keys.ini file, copy the following information and add the generated keys.

While we're at it, you'll need an account on RapidAPI as well, and put the key in the same keys.ini file. This is needed, because this API was used to get the sentiment for the tweets.

[twitter]

api_key = 
api_key_secret = 
bearer_token = 
access_token = 
access_token_secret = 

[rapidapi]

app_key =

Prerequisites

The following libraries and packages will be needed:

tweepy -> provides access to Twitter API within Python.

pip install tweepy

pandas -> data manipulation and analysis tool.

pip install pandas

requests -> HTTP library that allows sending HTTP requests easily.

pip install requests

How To Use

After installing everything previously mentioned, the program should work without any problems. Because the Twitter API scrapes all the activity (tweets, retweets, replies), from now on, the activity will be refered to as status. Down below, there's a simple example of how to get the statuses of a user and save them locally. For more information about how the classes or function work, or other methods you can use, refer to their documentation.

Retrieving data

First step is to actually get some information from twitter. Let's say we want to scrape the GitHub Profile and save the statuses in a CSV file. Write the code in main.py.

1. Create the object.

Takes the profile tag as first argument, and the number of statuses you want to retrieve as the second arguemnt.

github = UserProfileScraper("github", 100)

2. Get data.

We'll store the statuses in a variable. These will be retrieved by the search_user_activity() method.

github_data = github.search_user_activity()

3. Save data.

To save the scraped statuses locally, we'll use the export_dataframe method, passing as argument the previously created variable, github_data, where we stored the statuses.

github.export_dataframe(github_data)

The statuses are now saved in the current working directory, in data/raw_tweets.

Analyze data

With the file saved, the next step is to actually get some relevant information about the retrieved statuses.

1. See only the tweets.

From the 100 statuses previously scraped, we'll save only the tweets in another CSV file, in data/tweets_only. Another option is to directly print out the tweets in the console, without passing the export_as_csv argument.

tweets.get_tweets_only("github_data", 100, export_as_csv=True)

2. See like & source count.

Prints out the likes count for each tweet in descending order, respectively the devices the tweets were posted from. Pass tweets_only=False as argument to one of the functions to see information about all statuses, not only tweets.

tweets.get_likes_count("github_data", 100) tweets.get_source_count("github_data", 100)

3. Get the word count.

For better visibility, we'll save this as CSV file in data/word_count. This CSV file will show each word and the number of times it was used in the last tweets from the 100 statuses. Pass tweets_only=False as argument to see the count for all statuses.

tweets.get_word_count("github_data", 100, export_as_csv=True)

4. Get sentiment.

This uses Text Sentiment Analysis API. You can either print them out or save as csv in data/sentiment.

print(tweets.get_sentiment("github_data", 10))

To Be Added & Changed

Incorporate command line arguments with Argparse Module.
Improve the logger.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
scraper		scraper
tools		tools
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.py		config.py
logger.py		logger.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About The Project

Getting Started

External dependencies & services

Prerequisites

How To Use

Retrieving data

1. Create the object.

2. Get data.

3. Save data.

Analyze data

1. See only the tweets.

2. See like & source count.

3. Get the word count.

4. Get sentiment.

To Be Added & Changed

About

Uh oh!

Languages

BogdanOtava/Twitter-Scraping

Folders and files

Latest commit

History

Repository files navigation

About The Project

Getting Started

External dependencies & services

Prerequisites

How To Use

Retrieving data

1. Create the object.

2. Get data.

3. Save data.

Analyze data

1. See only the tweets.

2. See like & source count.

3. Get the word count.

4. Get sentiment.

To Be Added & Changed

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages