Spherical Merge HuggingFace-Pytorch format Language Models for minimal feature loss of parent models.
Traditionally, model merging often resorts to weight averaging which, although straightforward, might not always capture the intricate features of the models being merged. The SLERP technique in this script addresses this limitation, producing a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both its parents.
-
Smooth Transitions: SLERP ensures smoother transitions between model parameters. This is especially significant when interpolating between high-dimensional vectors.
-
Better Preservation of Characteristics: Unlike weight averaging, which might dilute distinct features, SLERP preserves the curvature and characteristics of both models in high-dimensional spaces.
-
Nuanced Blending: SLERP takes into account the geometric and rotational properties of the models in the vector space, resulting in a blend that is more reflective of both parent models' characteristics.
- Clone this repository.
git clone https://github.com/Digitous/LLM-SLERP-Merge.git
- Navigate to the cloned directory.
cd LLM-SLERP-Merge
- (Optional) Ensure you have the proper dependencies: numpy, torch, transformers, tkinter, and colorama; you can install them using:
pip install -r requirements.txt
- Run the SLERP script.
python slerpmergelm.py "model1" "model2" "result" "weight like [0.8, 0.2, 0.8]"
- Ensure parent models are of the same architecture and parameter size (for example both LLaMa2 13B pretrained language models). The script will do the rest, spherical merging both parent models and saving the offspring model to the selected save directory. For added convenience, it will also scan both parent directories to see if one has a special_tokens_map.json and will proceed to copy all relevant tokenizer files from there to the child directory (in case both or neither contains the special_tokens_map, it will still copy necessary files to the child dir providing a model instantly ready to use when the process is complete). NEW : Gradiant.
Some models, even of the same architecture and parameter size, may have a different vocab_size as defined in their config.json. For instance, LLaMa v1 and v2 13B have a standardized vocab of 32000 however, some pretrained LLaMa 13B models may deviate from this standard with a modified vocab of 32001 or 32032 and so on, making them incompatible for merge. There are ways to bypass this limitation and it will be addressed in the next update.
This project is unlicensed and unrestricted on how far it's proliferated, updated, modified, maintined, integrated, and shared around. A kind reference to this repo as well as dvschultz's script (which inspired this work) at https://gist.github.com/dvschultz/3af50c40df002da3b751efab1daddf2c would be nice.
Contributors: Digitous & CalderaAI For retrofitting SLERP script for LLM (Pytorch+HF format) merging.
Original Script dvschultz For giving insights on how to go about Spherical Linear Interpolation with their script.
Special Mention LostRuins For first weight averaging script for LLMs (that we know of; without their work, none of this would have come to fruition).