The repository containing the experiment files for the interpretable cross-lingual document ranking model using multilingual language model and Regularized Earth Mover's Distance.
Before starting the project make sure these requirements are available:
- conda. For setting up your research environment and python dependencies.
- git. For versioning your code.
- dvc. For versioning your data (part of conda environment).
First create the new conda environment:
conda env create -f environment.yml
To activate the newly set environment run:
conda activate lm-emd
Use conda to install the appropriate version of PyTorch. NOTE: Be sure you install the CUDA version that is supported by your machine.
conda install pytorch cudatoolkit=11.1 -c pytorch -c conda-forge
When the environment is not in use anymore deactivate it by running:
conda deactivate
First download the data. Go into the /data
folder and run
sh ./download.sh
This will download the data used in the experiments.
NOTE: Training a single model requires approximate 10 GB of GPU space.
To run the experiments one can manually change the params.yaml
file with
different parameters. Then, simply run the following commands:
# trains the model with the provided parameters
python src/train.py data/sasaki18 data/model.pth data/losses
# evaluates the model
python src/evaluate.py data/sasaki18 data/model.pth data/scores
We use DVC to automatically run experiments with different parameters. The dvc
is installed with conda
. To run multiple experiments we execute the following
command:
# prepare the queue of experiments using pairwise_ranking
dvc exp run --queue -S model.ranking=cls -S model.reg=None -S model.nit=None
dvc exp run --queue -S model.ranking=max -S model.reg=None -S model.nit=None
dvc exp run --queue -S model.ranking=mean -S model.reg=None -S model.nit=None
dvc exp run --queue -S model.ranking=emd -S model.reg=0.1
dvc exp run --queue -S model.ranking=emd -S model.reg=1
dvc exp run --queue -S model.ranking=emd -S model.reg=10
# execute all queued experiments (run 3 jobs in parallel)
dvc exp run --run-all --jobs 3
To train the models using cross entropy:
# to train the models using cross-entropy
dvc exp run --queue -S train.loss=cross_entropy -S model.ranking=cls -S model.reg=None -S model.nit=None
dvc exp run --queue -S train.loss=cross_entropy -S model.ranking=max -S model.reg=None -S model.nit=None
dvc exp run --queue -S train.loss=cross_entropy -S model.ranking=mean -S model.reg=None -S model.nit=None
dvc exp run --queue -S train.loss=cross_entropy -S model.ranking=emd -S model.reg=0.1
dvc exp run --queue -S train.loss=cross_entropy -S model.ranking=emd -S model.reg=1
dvc exp run --queue -S train.loss=cross_entropy -S model.ranking=emd -S model.reg=10
# execute all queued experiments (run 3 jobs in parallel)
dvc exp run --run-all --jobs 3
Afterwards, we can compare the performance of the models by running:
dvc exp show
To save the best performance parameters run:
# [exp-id] is the ID of the experiment that yielded the best performance
dvc exp apply [exp-id]
Model | EN → DE | EN → FR | EN → TL | EN → JA | EN → SW |
---|---|---|---|---|---|
BERT-CLS | .978 / .987 | .978 / .987 | .851 / .912 | .955 / .973 | .913 / .947 |
BERT-MAX | .941 / .964 | .948 / .969 | .798 / .874 | .912 / .946 | .824 / .886 |
BERT-MEAN | .967 / .980 | .958 / .976 | .786 / .874 | .941 / .965 | .835 / .897 |
LM-EMD | .977 / .986 | .974 / .985 | .801 / .874 | .955 / .973 | .890 / .932 |
Table 1. CLIR performance of the models. The scores are formatted as P@1/MAP scores.
Params | Loss | EN → DE | EN → FR | EN → TL | EN → JA | EN → SW |
---|---|---|---|---|---|---|
γ = 0.1 | CE | .876 / .927 | .843 / .909 | .674 / .793 | .846 / .908 | .631 / .754 |
PR | .977 / .986 | .974 / .985 | .801 / .874 | .955 / .973 | .890 / .932 | |
γ = 1 | CE | .876 / .927 | .846 / .910 | .669 / .790 | .846 / .907 | .617 / .747 |
PR | .970 / .982 | .968 / .981 | .809 / .883 | .910 / .946 | .859 / .913 | |
γ = 10 | CE | .878 / .928 | .846 / .911 | .671 / .792 | .848 / .909 | .628 / .753 |
PR | .965 / .979 | .961 / .978 | .805 / .881 | .899 / .941 | .835 / .899 |
Table 2. The performance comparison of the LM-EMD model trained with different regularization factor (γ) values, and using the cross-entropy (CE) and pairwise ranking (PR) loss functions during training. The scores are formatted as P@1/MAP scores.
The LM-EMD has one advantage over the rest of the evaluated models: interpetability. Not only does Earth Mover's Distance return the final relevance score of the document, it also returns the so called transportation matrix which shows which terms in the document match the terms in the query; giving an idea from where the document scores come from.
The transportation matrix is generated using the Sinkhorn algorihm by using the pre-generated cost matrix containing the cosine distances between the query and document terms. The transportation matrix tells us which terms in the query and document are closest, e.g. have the biggest influence on the document score.
Here are the cost and transportation matrices of the interpretability example found in the paper.
Once the model is trained one can modify the batch
values in ./src/interpret.py.
To visualize the interpertation graphs simply run the following command:
python src/interpret.py data/model.pth data/interpretability/{new-image-name}.png
where {new-image-name}
is the name of the image. This will generate an image in the
./data/interpretability folder.
This work is developed by Department of Artificial Intelligence at Jozef Stefan Institute.
The work is supported by the Slovenian Research Agency and the EU Horizon 2020 project Humane AI NET (H2020-ICT-952026).