Objective: Provide a great user experience for students on Xloosv app (private social communication platform) by predicting most interesting posts!
Focus: choose the best feature engineering techniques or implement a new one that could improve the accuracy.
plan:
- Predict the best posts (Part 1):
-
predict the number of upvotes a post will receive based on its textual content (NLP) and store it in a "NLP score" variable
-
use Elasticsearch Decay Function to predict the best 10 posts to show based on:
- creation Date
- number of likes
- number of comments
- NLP score
- predict the hashtag/topic/subreddit of a post based on its textual content (Part 2)
Other possible tasks:
- Merge the models to the Xloosv app using Firebase ML then make it continue learning and improving from future data
- repeat the previous tasks, but based on image content (for posts that contain images)
Dataset:
A large collection of Reddit posts: