Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4)

This repository is the official implementation of Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations.

Installation

To install requirements and the bof4 package:

pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
pip install -e .

Fine-Tuning

To run QLoRA fine-tuning with configurable quantizers the scripts/finetune.py can be used. The files config/finetune_code.yaml and config/finetune_instruct.yaml contain the configuration for reproducing the fine-tuning experiments from the paper. All utilized quantizer codebooks can be found in codebooks. For example, to finetune with BOF4-S quantization with block size 64 run:

python scripts/finetune.py --config config/finetune_code.yaml --quantizer codebooks/bof4/bof4-s_mse_64.yaml

Evaluation

To evaluate a model with quantization on the set of benchmarks used in the paper, run

python scripts/eval.py -m meta-llama/Llama-3.2-3B -q codebooks/bof4/bof4-s_mse_64.yaml

For a full list of options run:

python scripts/eval.py -h

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bof4		bof4
codebooks		codebooks
config		config
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4)

Installation

Fine-Tuning

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ifnspaml/bof4

Folders and files

Latest commit

History

Repository files navigation

Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4)

Installation

Fine-Tuning

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages