Skip to content

DiaaMohsen/sentiment_analysis-on_movie_reviews_kaggle

Repository files navigation

Sentiment Analysis on Movie Reviews Classify the sentiment of sentences from the Rotten Tomatoes dataset on kaggle

Kaggle Competition: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews

-On test.tsv attached in competition, i got overall accuracy 0.61835

  • I got overall accuracy: 0.686616258704 on testset extracted from splitting train.tsv precision recall f1-score support NEGATIVE 0.78 0.54 0.63 3011 SOMEWHAT NEGATIVE 0.48 0.61 0.54 6466 NEUTRAL 0.81 0.75 0.78 25868 SOMEWHAT POSITIVE 0.52 0.66 0.58 7755 POSITIVE 0.79 0.58 0.67 3718

    avg / total       0.71      0.69      0.69     46818
    

I built 3 classifers using [LinearSVC with CountVectorizer] as follows:

  • Classifier-1: classify {NEG, NEU, POS} on normalized sentiment values: {-1, 0, 1}
  • Classifier-2: classify {NEG, SMW NEG} on sentiment values: {0, 1}
  • Classifier-3: classify {SMW POS, POS} on sentiment values: {4, 5}

How does it work?

  • classify data with classifier-1 if data labeled with -1: classify it with classifier-2 if data labeled with 1: classify it with classifier-3 if data labeled with 0: no thing to run

Which models i used?

  • LinearSVC got the best results
  • SGDClassifier
  • MultinomialNB
  • TfidfVectorizer

ToDO:

  • Downsampling
  • Tuning models more than what i did
  • GridSearch
  • Try more models or combination of models