You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Categorizing each product review based on the overall rating of a product and finding the top 2000 words(tokens) in each of these categories(buckets)! Extensively using Java Streams and Lamdbas.
3
+
4
+
- Product reviews are categorized into 5 buckets using the overall rating integer!
5
+
- Each review is then cleaned by removing stop-words (data/stop_words.txt) and non-alphabetical characters!
6
+
- Word count analysis is then performed and top 2000 words are chosed for each bucket!
7
+
8
+
### Build and run:-
9
+
```
10
+
mvn clean install package
11
+
bash run.sh
12
+
```
13
+
14
+
#### Ouputs in Bucket1, Bucket2... Bucket5 directories.
0 commit comments