Skip to content

US Airlines Tweets Sentiment Analysis through VADER Lexicon, TextBlob, ML Models & Word Embeddings

Notifications You must be signed in to change notification settings

arpithaananth/Twitter_Airlines_Sentiment_Analysis

Repository files navigation

Twitter Sentiment Analysis of US Airlines

Approach for Sentiment Analysis

  • VADER Lexicon
  • TextBlob
  • Machine Learning Models
  • Word Embeddings

About the Dataset:

  • Source: Crowflower Data for Everyone
  • Twitter data of February 2015 was scraped, the tweets are classified into positive, negative & neutral emotions

Insights from the Data

Sentiment Distribution Analysis

63% of the tweets are negative, whereas 21% positive & 16% neutral Sentiment Distribution

Analysis of Airlines mentioned in the tweets

United Airlines followed by U.S Airlines have the highest tweet mentions Airlines listed

Analysis of Top 10 Tweet Locations

New York location has the highest tweet mentions Tweet Locations

Analysis of User Time-zones

Eastern time followed by Central time are the timezones of most users User Timezones

Word Cloud

Overall Word Cloud

Word Cloud of Positive Reviews

Pos Word Cloud

Word Cloud of Negative Reviews

Neg Word Cloud

Analysis of Negative Tweets

Top Negative Reasons

Top 15 Unigrams

15 unigrams

Top 25 Bigrams

25 bigrams

Top 25 Trigrams

25 Trigrams

Results of VADER Sentiment Analysis

Accuracy of VADER Lexicon is 49.61%

Results of TextBlob Analysis

Accuracy of TextBlob is 42.91%

Results of Machine Learning Models

ml models results BOW ML Models Comparison

Results of Word Embeddings (word2vec) Sentiment Analysis

image Word_Embedding ML Models

Inference:

The following models have the best performance

  • Sentiment Analysis with Logistic Regression gives 78% accuracy & 0.77 AUC-ROC Value
  • Word Embeddings with Gradient or XG Boost are good classifiers with 77% accuracy & 0.73 AUC-ROC Score