Skip to content

Leveraging the Naive Bayes Classification algorithm and web scraping with BeautifulSoup, this project analyzes sentiments in e-commerce product reviews.

Notifications You must be signed in to change notification settings

tanzealist/Ecom-Review-Sentiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis with Naive Bayes Classifier (NBC) and Web Scrapping

Introduction

This project, "E-Commerce Emotions," leverages Naive Bayes Classification to analyze sentiments in e-commerce product reviews. By scraping around 10,000 reviews, each review is labeled as negative, neutral or positive based on its rating, with the application of tokenization and stemming for textual data preparation.

Dataset

The dataset comprises approximately 10,000 product reviews. Each review has been labeled as:

  • Negative: If the rating is 1 or 2.
  • Neutral: If the rating is 3.
  • Positive: If the rating is 4 or 5.

Methodology

  • Web scraping was performed to collect the dataset.
  • Data preprocessing included tokenization and stemming.
  • A Naive Bayes Classifier was trained on the dataset.
  • The model was evaluated with an 80/20 train/test split.

Dependencies

  • pandas
  • numpy
  • scikit-learn
  • nltk
  • beautifulsoup4

Run this Project

To run this project you need to run below pynb files:

jupyter notebook web_scrapping.ipynb

jupyter notebook nbc_tanuj.ipynb

Refer to PPT and python files for flow of the project

Web Scrapping and Data Collection

Python's Beautiful Soup library was employed to automate the scraping of customer reviews from Trustpilot's Amazon page. We iterated over multiple pages to capture review texts and ratings, storing the information in a Pandas dataframe for subsequent analysis.

image

Exploratory Data Analysis (EDA)

We performed EDA to understand the distribution and common traits within the dataset.

Screenshot 2024-01-28 at 9 32 52 AM

Model Training and Evaluation

The NBC model was trained and its performance was evaluated using accuracy, precision, recall, and F1-score metrics.

Screenshot 2024-01-28 at 9 34 19 AM

Conclusion

Our sentiment analysis project effectively applied machine learning to classify Amazon product reviews from Trustpilot.com. We employed natural language processing to preprocess data, and our Naive Bayes Classifier, optimized via GridSearchCV, achieved an accuracy of 89%, precision of 87%, recall of 89%, and an F1-score of 87%. The model's performance, visualized through various plots, demonstrates its reliability in discerning customer sentiments, providing actionable insights for enhancing user experience and business offerings.

About

Leveraging the Naive Bayes Classification algorithm and web scraping with BeautifulSoup, this project analyzes sentiments in e-commerce product reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages