Skip to content

chujiezheng/LLM-Extrapolation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-Extrapolation

Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"

If you find this repository useful or our work is related to your research, please kindly cite it:

@inproceedings{
  llm-extrapolation,
  title={Model Extrapolation Expedites Alignment},
  author={Chujie Zheng and Ziqi Wang and Heng Ji and Minlie Huang and Nanyun Peng},
  booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics
},
  year={2025}
}

Models

We have uploaded the trained checkpoints and extrapolated models on 🤗 HuggingFace.

For the extrapolated models applied to open-source models, see this 🤗 HuggingFace collection.

For the zephyr checkpoints trained from zephyr-7b-sft-full in our controlled experiments, see this 🤗 HuggingFace collection.

Implementation of ExPO

The implementation of ExPO is extremely simple. You can refer to the code code/extrapolate.py (setting alpha to 0.3 or 0.5 is usually good).

Experimental Results

You can find the raw outputs of the standardized benchmarks AlpacaEval 2.0 (results_alpaca), MT-Bench (results_mtbench), and Open LLM Leaderboard (results_lmeval). For Open LLM Leaderboard, you can find the scores of the non-existing models from the official leaderboard.

We have also uploaded the AlpacaEval 2.0 evaluation results to the official leaderboard. You can find the detailed inference hyperparameters in their repository for reproduction.

Inference and Evaluation Code

The inference code includes code/generate_ultrafeedback.py and code/generate_alpaca.py. The script code/scripts/Starling-LM-7B-beta_extra.sh shows:

  • Do model extrapolation (ExPO) with a DPO/RLHF and its initial SFT checkpoints
  • Use a HuggingFace model to generate responses on UltraFeedback or AlpacaEval 2.0. The outputs will be saved to outputs_ultrafeedback or outputs_alpaca
  • Score the outputs using the reward model. The reward scores will be saved to rewards_ultrafeedback or rewards_alpaca

For the part of evaluation on standardized benchmarks:

About

Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published