Skip to content

PySpark is used to perform ETL and then connected to Amazon web services which is then loaded to pgAdmin. All this is done to calculate different metrics on US reviews for toys.

Notifications You must be signed in to change notification settings

hbustamante8/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Amazon_Vine_Analysis

Overview of Analysis

The purpose of this project is to analyze the Amazon Vine program and determine if there is bias towards favorable reviews between Vine/non-Vine members. PySpark was used to perform the ETL process and then used to connect to an Amazon Web Services RDS instance. From PySpark, the data from the instance was loaded and transformed into pgAdmin tables. Lastly, Jupyter notebook was used to calculate different metrics on the vine reviews. This specific dataset was US reviews for Toys.

Results

  • Total Vine reviews and non-Vine reviews image

  • Vine Reviews image

  • Non-Vine Reviews image

Percentage of Vine Reviews that were 5 stars image

Percentage of non-Vine reviews that were 5 stars image

Summary

The metrics revealed that 2.6% of the reviews in the Vine program are 5-star reviews while the non-Vine reviews are 97.4% of the total. Based on the results, there is no positivity bias for reviews in the Vine program. The Vine program which requires reviews is not inclining customers to give good reviews once they receive their orders. Also, this dataset happens to have almost all people doing toy reviews that are not a part of the Vine program, so it is not the ideal dataset to explore if there is positivity bias. An additional analysis that could be conducted at the opposite spectrum and repeat the analysis for 1-star reviews. The results would help to confirm if any bias can be inferred from the Toys dataset.

About

PySpark is used to perform ETL and then connected to Amazon web services which is then loaded to pgAdmin. All this is done to calculate different metrics on US reviews for toys.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published