Skip to content

Commit

Permalink
improve README
Browse files Browse the repository at this point in the history
  • Loading branch information
jacopotagliabue committed Jul 24, 2023
1 parent 19fdab5 commit e087a32
Show file tree
Hide file tree
Showing 4 changed files with 95 additions and 64 deletions.
26 changes: 13 additions & 13 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,10 @@ requiring unnecessary custom code and ad hoc procedures.

If you are not familiar with the library, we suggest first taking our small tour to get acquainted with the main abstractions through ready-made models and tests.

Starting
Colab Tutorials
~~~~~~~~



.. |colab1_tutorial| image:: https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/drive/1GVsVB1a3H9qbRQvwtb0TBDxq8A5nXc5w?usp=sharing
:alt: Open In Colab
Expand Down Expand Up @@ -105,7 +104,8 @@ This doc is structured as follows:
Quick Start
-----------

If you want to see *RecList* in action, clone the repository, create and activate a virtual env, and install the required packages from pip (you can install from root of course). If you prefer to experiment in an interactive, no-installation-required fashion, try out our `colab notebook <https://colab.research.google.com/drive/1GVsVB1a3H9qbRQvwtb0TBDxq8A5nXc5w>`__.
You can take a quick tour online using our `colab notebook <https://colab.research.google.com/drive/1GVsVB1a3H9qbRQvwtb0TBDxq8A5nXc5w>`__.
If you want to use *RecList* locally, clone the repository, create and activate a virtual env, and install the required packages from pip (you can also install from root of course).

.. code-block:: bash
Expand All @@ -114,20 +114,24 @@ If you want to see *RecList* in action, clone the repository, create and activat
python3 -m venv venv
source venv/bin/activate
pip install reclist
python examples/evalrs_2023.py
python examples/dummy.py
The sample script will run a suite of tests on a dummy dataset and model, showcasing a typical workflow with the library. Note the commented arguments in the script, which you can use to customize the behavior of the library
once you familiarize with the basic patterns (e.g. using S3 to store the plots and data, or leveraging a third party tool like Comet to track experiments).

Once you've run successfully the sample script, take the guided tour below to learn more about the abstractions and the out-of-the-box capabilities of *RecList*.
Once your development setup is working as expected, you can run `python examples/evalrs_2023.py` to explore more realistic tests on the `EvalRS 2023 Dataset <https://github.com/RecList/evalRS-KDD-2023>`__: make sure the `dataset <https://github.com/RecList/evalRS-KDD-2023/blob/c1b42ec8cb81562417bbb3c2713d301dc652141d/evaluation/utils.py#L18C11-L18C11>`__ is available in the `examples` folder before you run the script.
Finally, once you've run successfully the sample scripts, take the guided tour below to learn more about the abstractions and the full capabilities of *RecList*.

A Guided Tour
-------------

An instance of `RecList <https://github.com/jacopotagliabue/reclist/blob/main/reclist/reclist.py>`__ represents a suite of tests for recommender systems.

As the sample `examples/evalrs_2023.py` script shows, we leave users quite a large range of options: we provide out of the box standard metrics
As the sample `examples/evalrs_2023.py` script shows, we leave users quite a wide range of options: we provide out of the box standard metrics
in case your dataset is DataFrame-shaped (or you can / wish turn it into such a shape), but don't force you any pattern if you just want to use *RecList*
for the scaffolding it provides.

For example, the following code only assumes you have a dataset with golden labels, predictions, and metadata (e.g. item features) in the form of a DataFrame:
For example, the following code only assumes you have a dataset with golden labels, predictions, and metadata (e.g. item features) in the shape of a DataFrame:

.. code-block:: python
Expand All @@ -139,12 +143,8 @@ For example, the following code only assumes you have a dataset with golden labe
logger=LOGGER.NEPTUNE,
metadata_store= METADATA_STORE.LOCAL,
similarity_model=my_sim_model,
bucket=os.environ["S3_BUCKET"],
NEPTUNE_KEY=os.environ["NEPTUNE_KEY"],
NEPTUNE_PROJECT_NAME=os.environ["NEPTUNE_PROJECT_NAME"],
)
# run reclist
cdf(verbose=True)
Our library pre-packages standard recSys KPIs and important behavioral tests, but it is built with extensibility in mind: you can re-use tests in new suites, or you can write new domain-specific suites and tests.
Expand Down Expand Up @@ -183,7 +183,7 @@ Inheritance is powerful, as we can build new suites by re-using existing ones. H
Any model can be tested, as no assumption is made on the model's structure, but only the availability of *predictions*
and *ground truth*. Once again, while our example leverages a DataFrame-shaped dataset for these entities, you are free to build your own
RecList instance with any shape you prefer, provided you implement the metrics accordingly.
RecList instance with any shape you prefer, provided you implement the metrics accordingly (see the `examples/dummy.py` script for an example with different input types).

Once you run a suite of tests, results are dumped automatically and versioned in a folder (local or on S3), structured as follows
(name of the suite, name of the model, run timestamp):
Expand Down Expand Up @@ -211,7 +211,7 @@ based on DataFrames to make existing tests and metrics fully re-usable, but we d

* flexible, Python interface to declare tests-as-functions, and annotate them with *display_type* for automated charts;

* pre-built connectors with popular experiment trackers (e.g. Neptune, Comet), and an extensible interface to add your own;
* pre-built connectors with popular experiment trackers (e.g. Neptune, Comet), and an extensible interface to add your own (see the scripts in the `examples` folder for practical examples of using third-party trackers);

* reference implementations based on popular data challenges that used RecList.

Expand Down
18 changes: 14 additions & 4 deletions examples/dummy.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
"""
Example script to run a RecList over a dummy, free-form dataset.
Note that if you use LOGGER.COMET, you should uncomment the COMET variables to get the script to
log in your Comet account the relevant metrics as they are computed by the rec tests. Of course,
make sure the Comet python library is installed in your environment.
"""

import numpy as np
from reclist.logs import LOGGER
from reclist.similarity_models import FakeSimilarityModel
Expand Down Expand Up @@ -111,13 +121,13 @@ def predict(self):
metadata=metadata,
predictions=predictions,
model_name="myRandomModel",
logger=LOGGER.COMET,
logger=LOGGER.LOCAL,
metadata_store= METADATA_STORE.LOCAL,
similarity_model=my_sim_model,
# bucket=os.environ["S3_BUCKET"], # if METADATA_STORE.LOCAL you don't need this!
COMET_KEY=os.environ["COMET_KEY"],
COMET_PROJECT_NAME=os.environ["COMET_PROJECT_NAME"],
COMET_WORKSPACE=os.environ["COMET_WORKSPACE"],
# COMET_KEY=os.environ["COMET_KEY"], # if LOGGER.COMET, make sure you have the env
# COMET_PROJECT_NAME=os.environ["COMET_PROJECT_NAME"], # if LOGGER.COMET, make sure you have the env
# COMET_WORKSPACE=os.environ["COMET_WORKSPACE"], # if LOGGER.COMET, make sure you have the env
)

# run reclist
Expand Down
114 changes: 68 additions & 46 deletions examples/evalrs_2023.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
"""
Example script to run a RecList over the EvalRS2023 dataset.
The dataset files should be places in the folder evalrs_dataset_KDD23.
Download the dataset before running the script:
https://github.com/RecList/evalRS-KDD-2023/blob/main/evaluation/utils.py
Note that if you use LOGGER.NEPTUNE, you should uncomment the NEPTUNE variables to get the script to
log in your Neptune account the relevant metrics as they are computed by the rec tests. Of course,
make sure the Neptune python library is installed in your environment.
"""

from reclist.logs import LOGGER
from reclist.metadata import METADATA_STORE
import pandas as pd
Expand Down Expand Up @@ -87,54 +104,59 @@ def predict(self, user_ids: pd.DataFrame) -> pd.DataFrame:
We now load the dataset
"""

print("\n\n ======> Loading dataset. \n\n")
df_events = pd.read_csv('evalrs_dataset_KDD23/evalrs_events.csv', index_col=0, dtype='int32')
df_tracks = pd.read_csv('evalrs_dataset_KDD23/evalrs_tracks.csv',
if __name__ == '__main__':

print("\n\n ======> Loading the EvalRS2023 dataset from the local folder\n\n")
# make sure files are there
assert os.path.isfile('evalrs_dataset_KDD23/evalrs_events.csv'), "Please download the dataset first!"

df_events = pd.read_csv('evalrs_dataset_KDD23/evalrs_events.csv', index_col=0, dtype='int32')
df_tracks = pd.read_csv('evalrs_dataset_KDD23/evalrs_tracks.csv',
dtype={
'track_id': 'int32',
'artist_id': 'int32'
}).set_index('track_id')

df_users = pd.read_csv('evalrs_dataset_KDD23/evalrs_users.csv',
dtype={
'track_id': 'int32',
'artist_id': 'int32'
}).set_index('track_id')

df_users = pd.read_csv('evalrs_dataset_KDD23/evalrs_users.csv',
dtype={
'user_id': 'int32',
'playcount': 'int32',
'country_id': 'int32',
'timestamp': 'int32',
'age': 'int32',
})
'user_id': 'int32',
'playcount': 'int32',
'country_id': 'int32',
'timestamp': 'int32',
'age': 'int32',
})

"""
Here we would normally train a model, but we just return random predictions.
"""
my_df_model = EvalRSSimpleModel(df_tracks, top_k=10)
df_predictions = my_df_model.predict(df_users)
# build a mock dataset for the golden standard
all_tracks = df_tracks.index.values
"""
Here we would normally train a model, but we just return random predictions.
"""
my_df_model = EvalRSSimpleModel(df_tracks, top_k=10)
df_predictions = my_df_model.predict(df_users)
# build a mock dataset for the golden standard
all_tracks = df_tracks.index.values

df_dataset = pd.DataFrame(
{
'track_id': [choice(all_tracks) for _ in range(len(df_predictions))]
}
)
df_dataset = pd.DataFrame(
{
'track_id': [choice(all_tracks) for _ in range(len(df_predictions))]
}
)

"""
Here we use RecList to run the evaluation.
"""
"""
Here we use RecList to run the evaluation.
"""

# initialize with everything
cdf = DFSessionRecList(
dataset=df_events,
model_name="myDataFrameRandomModel",
predictions=df_predictions,
# I can specify the gold standard here, or doing it in the init of course
y_test=df_dataset,
logger=LOGGER.NEPTUNE,
metadata_store=METADATA_STORE.LOCAL,
bucket=os.environ["S3_BUCKET"],
NEPTUNE_KEY=os.environ["NEPTUNE_KEY"],
NEPTUNE_PROJECT_NAME=os.environ["NEPTUNE_PROJECT_NAME"],
)

# run reclist
cdf(verbose=True)
# initialize with everything
cdf = DFSessionRecList(
dataset=df_events,
model_name="myDataFrameRandomModel",
predictions=df_predictions,
# I can specify the gold standard here, or doing it in the init of course
y_test=df_dataset,
logger=LOGGER.LOCAL,
metadata_store=METADATA_STORE.LOCAL,
# bucket=os.environ["S3_BUCKET"], # if METADATA_STORE.LOCAL you don't need this!
#NEPTUNE_KEY=os.environ["NEPTUNE_KEY"],
#NEPTUNE_PROJECT_NAME=os.environ["NEPTUNE_PROJECT_NAME"],
)

# run reclist
cdf(verbose=True)
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,5 @@ numpy>=1.19.5
pathos==0.2.8
networkx==2.6.3
python-Levenshtein==0.12.2
pandas
scikit-learn
rich

0 comments on commit e087a32

Please sign in to comment.