Chaos is the accompanying proof of concept of my master's thesis "Engineering a Hybrid Reciprocal Recommender System Specialized in Human-to-Human Implicit Feedback" (FH Aachen, February 2021).
I've recently published my thesis, and everyone can now fully read, comprehend and reproduce the results to enter the world of human-to-human recommendations. You're invited to use the link for documentation, to understand what Chaos does, what RRSs and their challenges are, or just for your own research.
Just as the Recommender System (RS) domain, research on RRSs suffers from very limited reproducibility. I've engineered Chaos to tackle this problem (thesis section 5.2):
Chaos aims to build a solid bridge between the research and development departments of RRSs, with the ultimate goal that in the future, improvements are not developed in a decentralized fashion anymore.
I've contributed my work to the public to help to accelerate research together. This project can only flourish when other contributors join!
It is currently not meant to be ready for production or commercial applications. Please consult me to discuss potential use-cases where Chaos could help you or your business with. If applicable, you can also start a public discussion.
This section is a great demonstration of Chaos capabilities and usages. Work your way through the steps and take your time to experiment. The second experiment is an exciting possibility to get to know your personal social GitHub universe!
- Clone this repo with all submodules:
git clone --recurse-submodules https://github.com/kdevo/chaos-rrs.git
- Conda is needed as a cross-platform package and environment manager. Refer to the user guide for installation.
- Create the
chaos
environment viaconda env create -f environment.yml
(see also here) - Wait until the installation is finished and then activate
chaos
by callingconda activate chaos
- Trust the notebooks
jupyter trust *.ipynb
This optional example functions as a basic introduction to the framework's core components (see thesis section 4.1 onwards).
- Follow Preparation
- Start JupyterLab as follows:
jupyter lab notebooks/learning-group.ipynb
In this advanced scenario (see thesis section 4.4), you learn how to create a tailored RRS based on your own personal GitHub user profile!
- Follow Preparation if not done yet
- Have your GitHub account, and a stable internet connection ready
- Start JupyterLab as follows:
jupyter lab notebooks/chaos-github.ipynb
This section provides a brief and non-complete overview of Chaos features.
⚠️ Please read thesis chapter 4 for the complete description.
LFMPredictor
: Latent Factor Model User-to-User RS via LightFM (able to handle cold-start through metadata), wrapped withReciprocalWrapper
to fulfill reciprocity criterionRCFPredictor
: Reciprocal Collaborative Filtering, implemented as baseline algorithm- Anything that you provide by implementing the
Predictor
class
- Pandas' DataFrame for
DataModel
and more - Grapresso with NetworkX backend for representing the interaction graph from the
DataModel
- LightFM for a LFM based on Factorization Machines to mitigate the cold-start problem
- TensorBoard/Projector to visualize the learned embeddings
- altair-viz for visualizations, e.g. for the LFMEvaluator results
- spaCy for NLP feature extraction used in the
process
module - Optuna for Hyperparameter Optimization
- Jupyter Lab for reproducible notebooks
The DataModel
of Chaos is simple:
-
Interactions between users are stored in a graph/network:
-
User
a
is interested inb
if at least one interaction exists -
If
a
is interested inb
an edge (a
,b
) will be createdIf
b
is interested ina
an edge (b
,a
) will be created -
Each interaction linearly increases the strength of an edge
-
-
Metadata of users are stored in a long-format user data frame:
- Each row represents the metadata of one user
- Columns can contain variable data in form of collections (i.e., tags)
- Stored in a pandas data frame
The following provides a broad overview of the components:
fetch
- Retrieve data from a source- Very simple
Source
interface (implement a function that returns Chaos' DataModel)
- Very simple
process
- Process features, i.e. common Feature Engineering tooling for (R)RS- Extract common tags
- NLP to retrieve textual entities
- Graph-based feature extraction
recommend
- (R)RS implementations and typical workflowsTranslator
is used to make Chaos' data model understandable to the PredictorCandidateGenerator
s can be chained to retrieve compatible candidatesPredictor
is used for actual recommendationsEvaluator
for performance evaluation/comparison/optimization (e.g. precision, recall and f1)ReciprocalWrapper
helper to transform an RS to an RRS and perform reciprocal recommendations
The recommend
module is the core. You can skip fetch
and process
entirely, if you provide a proper DataModel on
your own. The above image shows the UML of the model.