Skip to content

Official Repository for "Hyper-CL: Conditioning Sentence Representations with Hypernetworks"

Notifications You must be signed in to change notification settings

HYU-NLP/Hyper-CL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Official Repository for "Hyper-CL: Conditioning Sentence Representations with Hypernetworks" [Paper(arXiv)])

Young Hyun Yoo, Jii Cha, Changhyeon Kim and Taeuk Kim. Accepted to ACL2024 long paper.

Table of Contents

C-STS

In this section, we describe how to train a Hyper-CL model by using our code. This code based on C-STS

Requirements

Run the following script, the requirements are the same as C-STS.

Data

Download the C-STS dataset and locate the file at data/ (reference the C-STS repository for more details.)

pip install -r requirements.txt

Training

We provide example training scripts for finetuning and evaluating the models in the paper. Go to C-STS/ and execute the following command

bash run_sts.sh

Following the arguments of C-STS, we explain the additional arguments in following :

  • --objective: (If you train Hyper-CL, you should use triplet_cl_mse)

  • --cl_temp: Temperature for contrastive loss

  • --cl_in_batch_neg: Add in-batch negative loss to main loss

  • --hypernet_scaler: To set the value of K for low-rank implemented Hyper-CL (i.e., hyper64-cl, hyper85-cl), we determine the divisor of the embedding size. For instance, in the base model, 'K=64' for hyper64-cl means the embedding size 768 is divided by 12. Thus, the hypernet_scaler is set to 12.

  • --hypernet_dual: Dual encoding that uses separate 2 encoders for sentences 1 and 2 and for the condition.

Hyperparameters

We use the following hyperparamters for training Hyper-CL:

Emb.Model Learning rate (lr) Weight decay (wd) Temperature (temp)
DiffCSE_base+hyper-cl 3e-5 0.1 1.5
DiffCSE_base+hyper64-cl 1e-5 0.0 1.5
SimCSE_base+hyper-cl 3e-5 0.1 1.9
SimCSE_base+hyper64-cl 2e-5 0.1 1.7
SimCSE_large+hyper-cl 2e-5 0.1 1.5
SimCSE_large+hyper85-cl 1e-5 0.1 1.9

SimKGC

We provide example training scripts for finetuning and evaluating the models in the paper. Go to sim-kcg/ and execute the following command. This code is based on SimKCG

Preprocessing WN18RR dataset

bash scripts/preprocess.sh WN18RR

Training

bash scripts/train_wn.sh

We explain the arguments in following:

  • --pretrained-model: Backbone model checkpoint (bert-base-uncased or bert-large-uncased)
  • --encoding_type: Encoding type (bi_encoder or tri_encoder)
  • --triencoder_head: Triencoder head (concat, hadamard or hypernet)
  • Refer to config.py for other arguments.

Evaluation for Perfomance and Inference Time

bash scripts/eval.sh ./checkpoint/WN18RR/model_best.mdl WN18RR

Citation

Please cite our paper if you use Hyper-CL in your work:

@article{yoo2024hyper,
  title={Hyper-CL: Conditioning Sentence Representations with Hypernetworks},
  author={Yoo, Young Hyun and Cha, Jii and Kim, Changhyeon and Kim, Taeuk},
  journal={arXiv preprint arXiv:2403.09490},
  year={2024}
}

About

Official Repository for "Hyper-CL: Conditioning Sentence Representations with Hypernetworks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published