Sentiment Analysis of Amazon Fine Food Reviews

Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews

The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon.

Number of reviews: 568,454
Number of users: 256,059
Number of products: 74,258
Timespan: Oct 1999 - Oct 2012
Number of Attributes/Columns in data: 10

Attribute Information:

Id
ProductId - unique identifier for the product
UserId - unqiue identifier for the user
ProfileName
HelpfulnessNumerator - number of users who found the review helpful
HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
Score - rating between 1 and 5
Time - timestamp for the review
Summary - brief summary of the review
Text - text of the review

Objective:

Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2).

[Q] How to determine if a review is positive or negative?

[Ans] We could use the Score/Rating. A rating of 4 or 5 could be cosnidered a positive review. A review of 1 or 2 could be considered negative. A review of 3 is nuetral and ignored. This is an approximate and proxy way of determining the polarity (positivity/negativity) of a review.

Solution

Here are the steps taken to solve the problem

Loading the data:. The data was imported from sqlite dump.
Cleaning the data: The data needed to be cleaned like we removed the duplicate reviews, reviews with HelpfulnessNumerator/HelpfulnessDenominator>1 and reviews which were for books but not for food.
Text to Vector: The third step was to convert text to vector. Here are the techniques we used to convert words to vector
1. Bag of words (with and without n-grams)
2. tf-idf
3. word to vector(w2v)
Applying Truncated SVD to reduce dimensions T-SNE can’t operate on sparse matrices and you’d have to convert them to dense matrices before hand.

This limits us from operating on large dimensions and huge datasets with T-SNE and one way to overcome this is to reduce the number of dimensions using truncatedSVD.

It is recommended to use another dimensionality reduction technique to reduce the dimensions to a reasonable amount before applying T-SNE https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python-8ef87e7915b.
Applying tSNE to visualize data After this we apply tSNE to visualize the data in 2 dimension by reducing the results obtained from truncated SVD.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
README.md		README.md
amazon_reviews_visualize.ipynb		amazon_reviews_visualize.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis of Amazon Fine Food Reviews

Objective:

Solution

About

Uh oh!

Releases

Packages

Uh oh!

Languages

deepanshululla/amazon_reviews_sentiment_analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Amazon Fine Food Reviews

Objective:

Solution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages