Official Repository for "Hyper-CL: Conditioning Sentence Representations with Hypernetworks" [Paper(arXiv)])
In this section, we describe how to train a Hyper-CL model by using our code. This code based on C-STS
Run the following script, the requirements are the same as C-STS.
Download the C-STS dataset and locate the file at data/ (reference the C-STS repository for more details.)
pip install -r requirements.txt
We provide example training scripts for finetuning and evaluating the models in the paper. Go to C-STS/ and execute the following command
bash run_sts.sh
Following the arguments of C-STS, we explain the additional arguments in following :
-
--objective
: (If you train Hyper-CL, you should usetriplet_cl_mse
) -
--cl_temp
: Temperature for contrastive loss -
--cl_in_batch_neg
: Add in-batch negative loss to main loss -
--hypernet_scaler
: To set the value of K for low-rank implemented Hyper-CL (i.e., hyper64-cl, hyper85-cl), we determine the divisor of the embedding size. For instance, in the base model, 'K=64' for hyper64-cl means the embedding size 768 is divided by 12. Thus, the hypernet_scaler is set to12
. -
--hypernet_dual
: Dual encoding that uses separate 2 encoders for sentences 1 and 2 and for the condition.
We use the following hyperparamters for training Hyper-CL:
Emb.Model | Learning rate (lr) | Weight decay (wd) | Temperature (temp) |
---|---|---|---|
DiffCSE_base+hyper-cl | 3e-5 | 0.1 | 1.5 |
DiffCSE_base+hyper64-cl | 1e-5 | 0.0 | 1.5 |
SimCSE_base+hyper-cl | 3e-5 | 0.1 | 1.9 |
SimCSE_base+hyper64-cl | 2e-5 | 0.1 | 1.7 |
SimCSE_large+hyper-cl | 2e-5 | 0.1 | 1.5 |
SimCSE_large+hyper85-cl | 1e-5 | 0.1 | 1.9 |
We provide example training scripts for finetuning and evaluating the models in the paper. Go to sim-kcg/ and execute the following command. This code is based on SimKCG
bash scripts/preprocess.sh WN18RR
bash scripts/train_wn.sh
We explain the arguments in following:
--pretrained-model
: Backbone model checkpoint (bert-base-uncased
orbert-large-uncased
)--encoding_type
: Encoding type (bi_encoder
ortri_encoder
)--triencoder_head
: Triencoder head (concat
,hadamard
orhypernet
)- Refer to
config.py
for other arguments.
bash scripts/eval.sh ./checkpoint/WN18RR/model_best.mdl WN18RR
Please cite our paper if you use Hyper-CL in your work:
@article{yoo2024hyper,
title={Hyper-CL: Conditioning Sentence Representations with Hypernetworks},
author={Yoo, Young Hyun and Cha, Jii and Kim, Changhyeon and Kim, Taeuk},
journal={arXiv preprint arXiv:2403.09490},
year={2024}
}