Skip to content

ifnspaml/bof4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4)

This repository is the official implementation of Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations.

Installation

To install requirements and the bof4 package:

pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
pip install -e .

Fine-Tuning

To run QLoRA fine-tuning with configurable quantizers the scripts/finetune.py can be used. The files config/finetune_code.yaml and config/finetune_instruct.yaml contain the configuration for reproducing the fine-tuning experiments from the paper. All utilized quantizer codebooks can be found in codebooks. For example, to finetune with BOF4-S quantization with block size 64 run:

python scripts/finetune.py --config config/finetune_code.yaml --quantizer codebooks/bof4/bof4-s_mse_64.yaml

Evaluation

To evaluate a model with quantization on the set of benchmarks used in the paper, run

python scripts/eval.py -m meta-llama/Llama-3.2-3B -q codebooks/bof4/bof4-s_mse_64.yaml

For a full list of options run:

python scripts/eval.py -h

About

Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published