Performing different types of product review analysis on Amazon dataset using Apache Spark and MongoDB.
Given the 5-core dataset, using Spark SQL to join collections and find the highest rated product in each category!
Categorizing each product review based on the overall rating of the product and then find the top 2000 words(tokens) in each of these categories(buckets)! Extensively using Java 8 stream and Lamdba.
- Java 8
- Spark 2.3.1
- MongoDB 4.0.2