Skip to content

Latest commit

 

History

History
15 lines (5 loc) · 938 Bytes

Final.md

File metadata and controls

15 lines (5 loc) · 938 Bytes

subreddit recommender

Reddit is 7th most visited site in the US. It is an aggregation of online discussion communities. Each content community is known as a subreddit, users can post links to other websites or create texts posts, which can be upvoted/downvoted based on the quality of the content. Then a discussion takes place, based on this post. There are a wide spectrum of subreddits, topics include science, poltics, music, various image types, jokes, news, and gaming.

For my capstone project, I would like to build a subreddit recommender. I have a dataset of 1.7 billion (~7 years) reddit comments.

I would like to present my results using a combination of a blog post and powerpoint using some aesthetic infographics.

The first step is upload the data to an S3 bucket, perform EDA to understand data, then using spark create a utility matrix for matrix factorization. Implement a collaborative filtering system.