This is the official code with preprocessed datasets for the WSDM 2021 paper: Local Collaborative Autoencoders
.
The slides can be found here.
Dataset | # Users | # Items | # Ratings | Sparsity | Concentration |
---|---|---|---|---|---|
ML10M | 69,878 | 10,677 | 10,000,054 | 98.66% | 48.04% |
ML20M | 138,493 | 26,744 | 20,000,263 | 99.46% | 66.43% |
AMusic | 4,964 | 11,797 | 97,439 | 99.83% | 14.93% |
AGames | 13,063 | 17,408 | 236,415 | 99.90% | 16.40% |
Yelp | 25,677 | 25,815 | 731,671 | 99.89% | 22.78% |
We use five public benchmark datasets: MovieLens 10M (ML10M), MovieLens 20M (ML20M), Amazon Digital Music (AMusic), Amazon Video Games (AGames), and Yelp 2015 (Yelp) datasets. We convert all explicit ratings to binary values, whether the ratings are observed or missing. For the MovieLens datasets, we did not modify the original data except for binarization. For the Amazon datasets, We removed users with ratings less than 10, resulting in 97,439 (Music) and 236,415 (Games) ratings. For the Yelp dataset, we pre-processed Yelp 2015 challenge dataset as in Fast Matrix Factorization for Online Recommendation with Implicit Feedback , where users and items with less than 10 interactions are removed.
You can get the original datasets from the following links:
Movielens: https://grouplens.org/datasets/movielens/
Amazon Review Data: https://nijianmo.github.io/amazon/
Yelp 2015: https://github.com/hexiangnan/sigir16-eals/tree/master/data
- Change the experimental settings in
main_config.cfg
and the model hyperparameters inmodel_config
. - Run
main.py
to train and test models. - Command line arguments are also acceptable with the same naming in configuration files. (Both main/model config)
For example: python main.py --model_name MultVAE --lr 0.001
Before running LOCA, you need (1) user embeddings to find local communities and (2) the global model to cover users who are not considered by local models.
- Run single MultVAE and EASE to get user embedding vectors and the global model:
python main.py --model_name MultVAE
and python main.py --model_name EASE
- Train LOCA with the specific backbone model:
python main.py --model_name LOCA_VAE
and python main.py --model_name LOCA_EASE
- Python 3
- Torch 1.5
Please cite our papaer:
@inproceedings{DBLP:conf/wsdm/ChoiJLL21,
author = {Minjin Choi and
Yoonki Jeong and
Joonseok Lee and
Jongwuk Lee},
title = {Local Collaborative Autoencoders},
booktitle = {{WSDM} '21, The Fourteenth {ACM} International Conference on Web Search
and Data Mining, Virtual Event, Israel, March 8-12, 2021},
pages = {734--742},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3437963.3441808},
doi = {10.1145/3437963.3441808},
timestamp = {Wed, 07 Apr 2021 16:17:44 +0200},
biburl = {https://dblp.org/rec/conf/wsdm/ChoiJLL21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}