RE-Control

🔥Aligning Large Language Models with Representation Editing: A Control Perspective

RE-Control aligns LLMs by introducing external control signals into the hidden states of a pre-trained LLM during test time.

There are two environments for this project. For all programs except metrics.py you can use the environment llm.txt. For metrics.py, you can use the environment metric.txt.

Installation (RE-Control)

Clone project and create environment with conda:

conda create -n recontrol python==3.10
conda activate recontrol

pip install -r llm.txt

Note: you may need to adjust the torch (cuda) version according to your GPU.

Training process

First, we need to get the activations from the LLM:

python get_activations_only.py --model_name llama3_8B --dataset_name shp

Then, we need to label the activations with a reward model:

python reward_label.py --model_name llama3_8B --dataset_name shp --reward_model openbmb/UltraRM-13b --mode train

Train a value model:
python train_value_model.py --model_name llama3_8B --dataset_name shp --lr 0.001

Conduct intervened inference:
python inference_intervention.py --model_name llama3_8B --dataset_name shp --use_intervention True --lr 1.0 --epochs 30 --value_lr 0.001

Evaluation process

Evaluate the average reward:
python measure_reward.py --out_file llama3_8B_shp_0.001_30_1.0 --model_name llama3_8B --dataset_name shp --reward_model openbmb/UltraRM-13b

Evaluate the diversity and coherence:
python metrics.py --run_name llama3_8B_shp_0.001_30_1.0

Evaluate the GPT-4 win rate:
python gpt4_eval.py --run_name_red llama3_8B_shp_0.0001_30_1.0 --run_name_blue dataset/dataset_prefer

You need to provide the preferred response in the dataset as 'run_name_blue'. We provide an exmaple in dataset_prefer.json.

Citation

If you find our work helpful, please consider citing our paper:

@inproceedings{
kong2024aligning,
title={Aligning Large Language Models with Representation Editing: A Control Perspective},
author={Lingkai Kong and Haorui Wang and Wenhao Mu and Yuanqi Du and Yuchen Zhuang and Yifei Zhou and Yue Song and Rongzhi Zhang and Kai Wang and Chao Zhang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=yTTomSJsSW}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RE-Control

Installation (RE-Control)

Training process

Evaluation process

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dataset		dataset
intervented_model		intervented_model
value_model_checkpoint		value_model_checkpoint
README.md		README.md
get_activations_only.py		get_activations_only.py
gpt4_eval.py		gpt4_eval.py
inference_intervention.py		inference_intervention.py
llm.txt		llm.txt
measure_reward.py		measure_reward.py
metric.txt		metric.txt
metrics.py		metrics.py
overview.jpg		overview.jpg
reward_label.py		reward_label.py
train_value_model.py		train_value_model.py

Lingkai-Kong/RE-Control

Folders and files

Latest commit

History

Repository files navigation

RE-Control

Installation (RE-Control)

Training process

Evaluation process

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages