LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation. The main features are:
-
Implemented a number of popular recommendation algorithms such as SVD++, DeepFM, BPR etc, see full algorithm list.
-
A hybrid recommender system, which allows user to use either collaborative-filtering or content-based features or both.
-
Low memory usage, automatically convert categorical and multi-value categorical features to sparse representation.
-
Support training for both explicit and implicit datasets, and negative sampling can be used for implicit dataset.
-
Making use of Cython or Tensorflow for high-speed model training.
-
Provide end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> serving.
-
Provide unified and friendly API for all algorithms.
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import SVDpp # pure data, algorithm SVD++
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
names=["user", "item", "label", "time"])
# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_testset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info) # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %
svdpp = SVDpp(task="rating", data_info=data_info, embed_size=16, n_epochs=3, lr=0.001,
reg=None, batch_size=256)
# monitor metrics on eval_data during training
svdpp.fit(train_data, verbose=2, eval_data=eval_data, metrics=["rmse", "mae", "r2"])
# do final evaluation on test data
svdpp.evaluate(test_data, metrics=["rmse", "mae"])
# predict preference of user 1 to item 2333
print("prediction: ", svdpp.predict(user=1, item=2333))
# recommend 7 items for user 1
print("recommendation(id, probability): ", svdpp.recommend_user(user=1, n_rec=7))
import numpy as np
import pandas as pd
from libreco.data import split_by_ratio_chrono, DatasetFeat
from libreco.algorithms import YouTubeRanking # feat data, algorithm YouTubeRanking
data = pd.read_csv("examples/sample_data/sample_movielens_merged.csv", sep=",", header=0)
data["label"] = 1 # convert to implicit data and do negative sampling afterwards
# split into train and test data based on time
train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)
# specify complete columns information
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col
)
test_data = DatasetFeat.build_testset(test_data)
train_data.build_negative_samples(data_info) # sample negative items for each record
test_data.build_negative_samples(data_info)
print(data_info) # n_users: 5962, n_items: 3226, data sparsity: 0.4185 %
ytb_ranking = YouTubeRanking(task="ranking", data_info=data_info, embed_size=16,
n_epochs=3, lr=1e-4, batch_size=512, use_bn=True,
hidden_units="128,64,32")
ytb_ranking.fit(train_data, verbose=2, shuffle=True, eval_data=test_data,
metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"])
# predict preference of user 1 to item 2333
print("prediction: ", ytb_ranking.predict(user=1, item=2333))
# recommend 7 items for user 1
print("recommendation(id, probability): ", ytb_ranking.recommend_user(user=1, n_rec=7))
For more examples and usages, see User Guide
JUST normal data format, each line represents a sample. One thing is important, the model assumes that user
, item
, and label
column index are 0, 1, and 2, respectively. You may wish to change the column order if that's not the case. Take for Example, the movielens-1m
dataset:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
Besides, if you want to use some other meta features (e.g., age, sex, category etc.), you need to tell the model which columns are [sparse_col, dense_col, user_col, item_col
], which means all features must be in a same table. See above YouTubeRanking
for example.
For how to serve a trained model in LibRecommender, see Serving Guide .
From pypi :
$ pip install LibRecommender==0.2.2
To build from source, you 'll first need Cython and Numpy:
$ # pip install numpy cython
$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender
$ python setup.py install
- Python >= 3.6
- tensorflow >= 1.14
- numpy >= 1.15.4
- pandas >= 0.23.4
- scipy >= 1.2.1
- scikit-learn >= 0.20.0
- gensim>=3.6.0
- tqdm >= 4.46.0
- hnswlib
LibRecommender
is tested under tensorflow 1.14 and 2.3. If you encounter any problem during running, feel free to open an issue.
- flask >= 1.0.0
- requests >= 2.22.0
- redis == 3.0.6
- redis-py >= 3.3.5
- faiss == 1.5.2
- Tensorflow Serving
pure
means collaborative-filtering algorithms which only use behavior data,feat
means other features can be included,seq
means sequence or graph algorithms.