Skip to content

Data analysts were tasked with analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.

Notifications You must be signed in to change notification settings

acfthomson/Amazon_Vine_Analysis

Repository files navigation

Amazon_Vine_Analysis

Overview

Data analysts were tasked with analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.

Data from Amazon's Lawn and Garden department was analyzed to determine if having a paid Vine review makes a difference in the percentage of 5-star reviews. The Extract, Transform, and Load (ETL) process was used on the Amazon Lawn and Garden dataset. An AWS RDS database was create and the Lawn and Garden dataset was uploaded into an S3 bucket. pgAdmin was utilized to connect to AWS, and pySpark and postgreSQL were used against the data set to create four separate DataFrames to match the table schema in pgAdmin. The transformed data was then uploaded into AWS RDS.

Resources

Results

The Vine table was exported from pgAdmin and Python Pandas was used against the data. The Vine table contained 1,048,575 rows of data.

Vine_table_csv

The Vine table data was transformed to show only reviews where there were 20 or more reviews for the product. This new Pandas DataFrame reduced the dataset to 8,488 rows of Lawn and Garden Amazon reviews.

helpful_votes_df

This data set was further reduced to show only rows where helpful_votes was greater than or equal to 50% of helpful_votes divided by total_votes. This gave 7,801 rows. The parameters used against the Vine table resulted in a 99% reduction in data.

better_helpful_df

Two new Pandas DataFramse were created to retrieve all rows where the review was part of the Vine program and to retrieve all rows where the review was not part of the Vine program. This resulted in 91 paid reviews and 7,710 unpaid reviews.

Paid Reviews Unpaid Reviews

Of the paid Vine reviews, only 43 were 5-star reviews. Out of the unpaid reviews, 4,040 were 5-star reviews.

Paid 5-Star Reviews Unpaid 5-Star Reviews

Paid Vine 5-star reviews accounted for 47.25% of the data, whereas 52.4% were unpaid 5-star reviews.

% of Paid 5-Star Reviews % of Unpaid 5-Star Reviews

Summary

Based on analysis of the sample selected from the Amazon Lawn and Garden reviews, Vine reviews did not appear to affect 5-star reviews. There were slightly more 5-star reviews from unpaid reviews.

Additional analysis should be conducted against multiple Amazon review datasets, with the same parameters that were used against the Lawn and Garden dataset. Other datasets could show that Vine reviews are more likely to give 5-star product reviews. It would also be interesting to determine if male or female are more likely to leave a review in general versus how likely are they to leave a 5-star review as a Vine member or unpaid reviews.

About

Data analysts were tasked with analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published