Skip to content

Applied Sentiment Analysis to classisfy tweets as positive or negative

License

Notifications You must be signed in to change notification settings

carlos-vf/Twitter-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Sentiment Analysis

This program computes a sentiment analysis over a dataset of more than a million real tweets classified as positive or negative.

Dataset

tweets.zip file must be unziped to extract tweets.csv, which contains a whole collection of tagged tweets (0: negative, 4: positive).

Main program

TSA.py preprocesses data by applying:

  • Token distinction (phone numbers, HTML tags, usernames, urls, etc.)
  • Token normalization (lowercase transformation)
  • Punctuation signs normalization (!!! -> exclamations)
  • Substitution (@username 123 -> user, https://github.com/ -> url, etc.)
  • Word normalization (perrrrfect -> perfect)
  • Negation ("I don't like coffee" -> "I don't NOT_like NOT_coffee")

After that, four different models are trainned and used for classification:

  • Naïve Bayes
  • Decission tree
  • Logistic regression
  • Support vector machines

About

Applied Sentiment Analysis to classisfy tweets as positive or negative

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages