Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
If you find this repository useful or our work is related to your research, please kindly cite it:
@inproceedings{
llm-extrapolation,
title={Model Extrapolation Expedites Alignment},
author={Chujie Zheng and Ziqi Wang and Heng Ji and Minlie Huang and Nanyun Peng},
booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics
},
year={2025}
}
We have uploaded the trained checkpoints and extrapolated models on 🤗 HuggingFace.
For the extrapolated models applied to open-source models, see this 🤗 HuggingFace collection.
For the zephyr
checkpoints trained from zephyr-7b-sft-full
in our controlled experiments, see this 🤗 HuggingFace collection.
The implementation of ExPO is extremely simple. You can refer to the code code/extrapolate.py
(setting alpha to 0.3 or 0.5 is usually good).
You can find the raw outputs of the standardized benchmarks AlpacaEval 2.0 (results_alpaca
), MT-Bench (results_mtbench
), and Open LLM Leaderboard (results_lmeval
). For Open LLM Leaderboard, you can find the scores of the non-existing models from the official leaderboard.
We have also uploaded the AlpacaEval 2.0 evaluation results to the official leaderboard. You can find the detailed inference hyperparameters in their repository for reproduction.
The inference code includes code/generate_ultrafeedback.py
and code/generate_alpaca.py
. The script code/scripts/Starling-LM-7B-beta_extra.sh
shows:
- Do model extrapolation (ExPO) with a DPO/RLHF and its initial SFT checkpoints
- Use a HuggingFace model to generate responses on UltraFeedback or AlpacaEval 2.0. The outputs will be saved to
outputs_ultrafeedback
oroutputs_alpaca
- Score the outputs using the reward model. The reward scores will be saved to
rewards_ultrafeedback
orrewards_alpaca
For the part of evaluation on standardized benchmarks:
- To run the official AlpacaEval 2.0 evaluation, follow https://github.com/tatsu-lab/alpaca_eval?tab=readme-ov-file#evaluating-a-model
- To run the official MT-Bench evaluation, follow https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge (you can host the local vllm server to speed up inference)
- To run the official Open LLM Leaderboard evaluation, follow https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard (About -> REPRODUCIBILITY)