Skip to content

A Neo4j recommender system using the MovieLens dataset

License

Notifications You must be signed in to change notification settings

aleceress/movielens_rs

Repository files navigation

A Neo4j Recommender System

This repository contains the implementation of a Recommender System in Neo4j.

Dataset

The data used for recommendation come from some of the tables of the MovieLens 25M Dataset, specifically ratings.csv, movies.csv, tags.csv, genome-scores.csv and genome-tags.csv. You need to insert them in a data folder.

Population script

The script populate_db.py populates a pre-existing Neo4j graph with data from these tables. An example of instantiation of the graph can be seen in figure.

The script is gonna generate some pickle files in the data folder (serialized dictionaries that map original dataset ids to the UUIDs used in the Neo4j database).

NB: you need to have a Neo4j database running on your machine (connection is to localhost). The script is gonna ask you if you want to delete your data from the current database: this is done because if you execute the script twice, all data will be duplicated.

Recommendation

The file datasetanalysis.ipynb contains some statistics on the dataset that help understand performance.

The file queries.ipynb contains execution and performance measures of the queries implied by the following workflow.


  1. Given a User, find his top k Genres
  2. Given a User, find his top k Categories
  3. Given a Genre, find its top k Users
  4. Given a Category, find its top k Users
  5. Given a User, find similar users
  6. Given a Users, recommend Movies based on similar Users
  7. Given a Movie, find similar Movies
  8. Given a User, recommend Movies based on similarity with the ones he has rated.

The file gds_recommendation.py contains some functions used for the recommendation, basically wrappers of some GDS library functions.

Relazione.pdf and Neo4j Recommender System.pdf contain a deeper discussion on the project (in italian) and a summary presentation of it (in english).

Running

To run all the code in the respository, you can create a virtual environment and run the following commands.

virtualenv venv 
source ./venv/bin/activate
pip install -r requirements.txt

Non enterprise versions of Neo4j do not consent to have more than one active database at the time: if you don't want to use the default database neo4j, you can create a new one and activate it following this procedure.

NB: it is advisable to execute the script populate_db.py on a machine with at least 8 GB of RAM.

Releases

No releases published

Packages

No packages published