____ _ _ ____ ____ ____ __ __ _____ ____ _____ _____ / ___| | | | | / _ \ | _ \ | _ \ | \/ | | ___| | _ \ / ____| | ___| \___ \ | |__| | | |_| | | |_| | | | | | | |\/| | | |__ | |_| | | | _ | |__ ___) | | __ | | _ | | / | |_| | | | | | | __| | / | |_| | | __| |____/ |_| |_| |_| |_| |_|\_\ |____/ |_| |_| |_____| |_|\_\ \_____| |_____|
A powerful tool for merging multiple finetuned LLM models by computing and combining their delta weights.
- Merge multiple finetuned models with their base model
- Efficient shard-based processing with minimal memory and disk footprint
- Concurrent downloads with progress tracking
- Support for HuggingFace model hub integration
- SafeTensors format support
poetry install --no-root
Create a YAML configuration file:
output_base_model: "unsloth/Meta-Llama-3.1-70B-Instruct"
output_dtype: "bfloat16"
finetune_merge:
- { "model": "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF", "base": "unsloth/Meta-Llama-3.1-70B-Instruct", "alpha": 0.3, "is_input": true }
- { "model": "another/finetuned-model", "base": "unsloth/Meta-Llama-3.1-70B-Instruct", "alpha": 0.5, "is_output": true }
output_dir: "output_model"
device: "cpu"
clean_cache: false
cache_dir: "cache"
storage_dir: "storage"
Run the merge:
python -m shard merge config.yaml
Optional arguments:
--verbose
: Enable detailed logging
- Downloads model shards concurrently
- Computes delta weights between base and finetuned models
- Combines deltas efficiently using tensor operations
- Writes output in compatible SafeTensors format
LGPL-3.0 - See LICENSE.txt file for details.
Contributions welcome! Please feel free to submit pull requests.