Skip to content

Commit 1cd1706

Browse files
authored
Create Readme.md
1 parent ec2aa85 commit 1cd1706

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

top-tokens-in-reviews/Readme.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
## Top tokens in product reviews
2+
Categorizing each product review based on the overall rating of a product and finding the top 2000 words(tokens) in each of these categories(buckets)! Extensively using Java Streams and Lamdbas.
3+
4+
- Product reviews are categorized into 5 buckets using the overall rating integer!
5+
- Each review is then cleaned by removing stop-words (data/stop_words.txt) and non-alphabetical characters!
6+
- Word count analysis is then performed and top 2000 words are chosed for each bucket!
7+
8+
### Build and run:-
9+
```
10+
mvn clean install package
11+
bash run.sh
12+
```
13+
14+
#### Ouputs in Bucket1, Bucket2... Bucket5 directories.

0 commit comments

Comments
 (0)