Skip to content

Ensemble recommendation (recommender) system for finding similar games on Steam

License

Notifications You must be signed in to change notification settings

AdrianKlessa/SteamScout

Repository files navigation

A similarity search and recommendation system for finding games on Steam

Test workflow Docker compose

Find similar games to the ones you like - even if the genres you like are not very popular

image

The system is a weighted ensemble recommender composed of:

  • A recommender rating the games based on the similarity of their vectorized descriptions ("About this game" section)
  • A recommender rating the similarity of games based on their tags
  • A simple check for the ratio of positive user reviews to negative

Since this ensemble does not take into account the number of reviews (only the fraction of positive ones), it tends to catch games with very few reviews if they appear to be a good match to the one the user searched for.

Both the tags and descriptions are compared based on the cosine similarity of their vectors.

The tags are binary vectors (consisting of 1s at the indices matching a particular tag the game has and 0s for tags it lacks). The description vector is generated by using doc2vec after training on the entire dataset.

The vectors are stored using vectordb for easy Approximate Nearest Neighbor search.

Build & installation (Docker)

  1. Clone this repository
  2. Download the Steam Games dataset from Kaggle and put games.json inside data/raw
  3. Run docker compose up from the repo's directory.

Building the app will take a while (possibly 15+ minutes) since it's preprocessing the dataset and training doc2vec on it.

(I did not want to include a pretrained model or the data as part of this repository)

Running the app after it has been built uses the generated model and DBs so it should be much faster.

Due to heavy reliance on a local database, the app greatly benefits from using an SSD. Full Text Search and indexing on queried columns is used to reduce search delays.

By default, the frontend is available on the host machine at http://localhost:5173

Unit tests

Unit tests can be executed from the src directory by running python -m unittest discover ../tests


Utilizes the Steam Games dataset published on Kaggle by Martin Bustos Roman, originally generated with Steam Games Scraper by Martin Bustos.