Skip to content

Extract, Transform, and Load Analysis on Amazon reviews written by paid v nonpaid members to determine the bias of favorable vine reviews.

Notifications You must be signed in to change notification settings

Wamuza1/Amazon_Vine_Analysis

Repository files navigation

Amazon_Vine_Analysis

By using PySpark, Google Colab, PgAdmin, AWS RDS and S3

Project overview

The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies like SellBy pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.

In this project, we have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We picked one of these datasets and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Also, we used PySpark to determine if there is any bias toward favorable reviews from Vine members in the dataset.

Resources

"https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Home_Entertainment_v1_00.tsv.gz"

Results

  • The customers_table DataFrame

image

  • The products_table DataFrame

image

  • The review_id_table DataFrame

image

  • The vine_table DataFrame

image

DataFrames into pgAdmin

  • customers_table

image

  • products_table

image

  • review_id_table

image

  • vine_table

image

vine_table analysis

  • Vine reviews 261 and non-Vine reviews 24040

image image

  • There were total 11005 five star reviews. Vine reviews were five stars 106 and non-Vine reviews were five stars 10899

image

  • Percentage of Vine reviews there were five stars is 40.61% and for non-Vine reviews five stars is 45.34%.

  • image

Summary: Determine Bias of Vine Reviews.

The output of both percentages does not have enough margin to decide whether they contain any bias or not in the Vine program. The vine sample size still has a decent number while it is important to note that the non-vine sample has not much difference. The output of both percentages does not have enough margin to decide whether they contain any bias or not in the Vine program. We can perform further analysis on verified purchases to determine the percentage and compare them to see if we can decide if this reveals any positivity bias.

About

Extract, Transform, and Load Analysis on Amazon reviews written by paid v nonpaid members to determine the bias of favorable vine reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published