- Fork the model's Hugging Face repo (adapted from the documentation):
git lfs install --skip-smudge --local &&
git remote add upstream [email protected]:ibm-granite/granite-7b-instruct &&
git fetch upstream &&
git lfs fetch --all upstream
- If you want to completely override the fork history (which should only have an initial commit), run:
git reset --hard upstream/main &&
git lfs pull upstream
- If you want to rebase instead of overriding, run the following command and resolve any conflicts:
git rebase upstream/main &&
git lfs pull upstream
- Compress all the model weights Download the scripts for compressing/decompressing AI Models:
wget -i https://raw.githubusercontent.com/zipnn/zipnn/main/scripts/scripts.txt &&
rm scripts.txt
python3 zipnn_compress_path.py safetensors --path .
- Add the compressed weights to git-lfs tracking and correct the index json
git lfs track "*.znn" &&
sed -i 's/.safetensors/.safetensors.znn/g' model.safetensors.index.json &&
git add *.znn .gitattributes model.safetensors.index.json &&
git rm *.safetensors
- Done! Now push the changes as per the documentation:
git lfs install --force --local && # this reinstalls the LFS hooks
huggingface-cli lfs-enable-largefiles . && # needed if some files are bigger than 5GB
git push --force origin main
To use the model simply run our ZipNN Hugging Face method before proceeding as normal:
from zipnn import zipnn_hf
zipnn_hf()
# Load the model from your compressed Hugging Face model card as you normally would
...
In this example we show how to use the compressed ibm-granite granite-7b-instruct hosted on Hugging Face.
First, make sure you have ZipNN installed:
pip install zipnn
To run the model, simply add zipnn_hf()
at the beginning of the file, and it will take care of decompression for you. By default, the model remains compressed in your local storage, decompressing quickly on the CPU only during loading.
from transformers import AutoTokenizer, AutoModelForCausalLM
from zipnn import zipnn_hf
zipnn_hf()
tokenizer = AutoTokenizer.from_pretrained("royleibov/granite-7b-instruct-ZipNN-Compressed")
model = AutoModelForCausalLM.from_pretrained("royleibov/granite-7b-instruct-ZipNN-Compressed")
Alternatively, you can save the model uncompressed on your local storage. This way, future loads won’t require a decompression phase.
zipnn_hf(replace_local_file=True)
To compress and decompress manually, simply run:
python zipnn_compress_path.py safetensors --model royleibov/granite-7b-instruct-ZipNN-Compressed --hf_cache
python zipnn_decompress_path.py --model royleibov/granite-7b-instruct-ZipNN-Compressed --hf_cache
You can try other state-of-the-art compressed models from the updating list below:
You can also try one of these python notebooks hosted on Kaggle: granite 3b, Llama 3.2, phi 3.5.
Click here to explore other examples of compressed models hosted on Hugging Face