Skip to content

davpinto/ml-eng-project

Repository files navigation

Movie Similarity

Author: David Pinto
2020-10-21

This project implements a recommender system for similar movies based on content and collaborative filtering embedding features.

Website ml-eng-proj

Documentation

Setup

Create a conda environment and install all required packages listed in the env_requirements.txt file.

# Create environment
conda create -n movie-similarity -y python=3.7

# Activate environment
conda activate movie-similarity

# Append conda-forge to the list of channels
conda config --append channels conda-forge

# Install dependencies
conda install -y --file env_requirements.txt

# Add environment to Jupyter
python -m ipykernel install --user --name=movie-similarity

Required Pakages

  • numpy and pandas for data cleaning, manipulation and transformation.
  • scipy for sparse matrices and correlation measures.
  • unidecode and nltk for text manipulation.
  • scikit-learn for data normalization and text vectorization.
  • vaex for manipulation of large DataFrames.
  • matplotlib and plotnine for data visualization.
  • lightfm for collaborative filtering with matrix factorization.
  • faiss for fast Approximate Nearest Neighbors algorithms.

Dataset

Take a look at the data/raw folder to get instructions on how to download the dataset.

Notebooks

The project is organized on Jupyter notebooks. Each notebook is self-contained and well documented:

Embedding Visualization

You can play with the movie embedding features using the Embedding Projector here. It can take a few seconds to start. But it will be worth it!

Take a look at the projector folder to see some results.


Deploy Web Application

The project provides a Streamlit application to play with the movie recommender.

To run it locally:

make docker-build
make docker-run

Congratulations! You have it running on 127.0.0.1:8501:


Choose an recommendation algorithm and a movie title to get recommendations of similar movies. I hope you enjoy it!