Skip to content

Movie Recommendation System | Machine Learning | Deep Learning

Notifications You must be signed in to change notification settings

avyaktawrat/Evaluat-inator

Repository files navigation

Evaluat-inator

Movie Recommendation System | Machine Learning

Types of recommendation system

There are two types of recommendation system:-

- Content based recommendation system.

First, the system executes a model-building stage by finding the similarity between all pairs of items. This similarity function can take many forms, such as correlation between ratings or cosine of those rating vectors.
Second, the system executes a recommendation stage. It uses the most similar items to a user's already-rated items to generate a list of recommendations.

- Collaborative recommendation system

A user expresses his or her preferences by rating items (e.g. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the user's interest in the corresponding domain.
The system matches this user's ratings against other users' and finds the people with most "similar" tastes.
With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user.

Data Set

Data set of 1M ratings is used taken from Movie lens.
It contains around 1M ratings given by around 6k users on around 4k movies.

Libraries used

Exploratory Data Analysis (EDA)

The data-set used contains :-

  1. Movies dataframe - containes 3883 movies with moviesID, title and genres.
  2. Users dataframe - containes 6040 users with userID, gender, zipcode, age.
  3. Ratings dataframe - containes 1000209 ratings with their userID, movieID and ratings.

These plot gives some idea that there are only few movies out of all that are rated highly/ frequently, so for the better performance of model we can remove outlayers, ie those movies that have less than 50 ratings and removing ratings for those movies and removing those users which aren't active or have rated less than 50 movies.

KNN based Approach

  • The data is read in data frame as ratings, users and movies. These df's are processed as discribed in EDA section.
  • The processed data is used to create a matrix(namely movie_user_mat ) between moviesId and userId as rows and columns respectively. The values of the cell of matrix( movie_user_mat[i, j] ) is the rating given by jth user on ith movie. This matrix is transformed into scipy sparse matrix for easy computation.
  • A mapper(namely movie_to_idx ) is a dictionary which is created, that maps movie to it's index according to movies dataframe.
  • The matrix is fed into NearestNeighbors model of sklearn. 'Cosine' similarity metric is used with brute algorithm.
# define model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)
# fit
model_knn.fit(movie_user_mat_sparse)
  • Fuzzy_matching function takes in favorite movie as input and gives out index of most similar movie listed in mapper. The similarity is calculated via fuzz ratio.
  • In the function make_recommendation the data(i.e. the movie_user_mat) is fit in knn model, it then finds n nearest neighbours of data[idx], where idx given out by Fuzzy_matching function for favorite movie.
    The distance is sorted in top n neighbours with maximum distance( i.e. minimum angle as cosine similarity is used) is printed.
    # fit
    model_knn.fit(data)
    idx = fuzzy_matching(mapper, fav_movie, verbose=True)
    
    distances, indices = model_knn.kneighbors(data[idx], n_neighbors=n_recommendations+1)
    # get list of raw idx of recommendations
    raw_recommends = \
        sorted(list(zip(indices.squeeze().tolist(), distances.squeeze().tolist())), key=lambda x: x[1])[:0:-1]

About

Movie Recommendation System | Machine Learning | Deep Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published