Sentiment Analysis with NLP Models

This project applies Natural Language Processing (NLP) techniques to classify IMDB movie reviews as positive or negative.

It compares three different modeling approaches:

Naive Bayes with Bag-of-Words / TF-IDF
Deep Learning with LSTM + GloVe embeddings
Transformer-based models (ALBERT, DistilBERT)

Dataset

Source: IMDB 50K Movie Reviews Dataset

Size: 50,000 labeled reviews (balanced: 25k positive / 25k negative)

Target: Sentiment (binary classification)

Methodology

1. Exploratory Data Analysis (EDA)

Review length distribution (words & characters)
Stopword analysis
Word clouds for positive/negative reviews
Patterns: HTML tags, emojis, excessive punctuation, slang

2. Preprocessing

HTML/URL removal
Lowercasing & contraction expansion
Stopword removal & lemmatization
Tokenization (NLTK & HuggingFace)

3. Feature Engineering

CountVectorizer & TF-IDF for ML baseline
Word embeddings (GloVe 100D) for LSTM
Tokenizer + Padding for sequence models

4. Models Trained

Naive Bayes: Baseline with DTM & TF-IDF
LSTM (Bi-LSTM + GlobalMaxPool): trained with GloVe embeddings
Transformers: Fine-tuned ALBERT & DistilBERT (HuggingFace Trainer)

5. Evaluation

Metrics: Accuracy, Precision, Recall, F1
Confusion matrices plotted for each model

Results

Naive Bayes: Good baseline, limited accuracy with %86 macro average F1-score
LSTM + GloVe: Improved performance, better contextual capture with %87 macro average F1-score
DistilBERT / ALBERT: Best overall performance with %94 macro average F1-score

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
sentiment_analysis_w_nlp0.ipynb		sentiment_analysis_w_nlp0.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis with NLP Models

Dataset

Methodology

Results

About

Uh oh!

Releases

Packages

Languages

TalhaMemisoglu/sentiment_analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis with NLP Models

Dataset

Methodology

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages