A big data analysis practice for reddit comments, using tf-idf statistic metric.
- Source
- Processing
- The pushshift.io Reddit API was designed and created by the datasets mod team to help provide enhanced functionality and search capabilities for searching Reddit comments and submissions. The project lead, stuck_in_the_matrix, is the maintainer of the Reddit comment and submissions archives located at https://files.pushshift.io.
- The official JSON API from reddit: https://www.reddit.com/dev/api/.
- MapReduce
- Spark
Contributors: Qijin(Jack) Xu, Ken Tjhia, Ibrahim Suedan
Lastly updated: 2019-12-26