This repository contains an unofficial PyTorch implementation of BitNet a4.8: 4-bit Activations for 1-bit LLMs (Wang et al., 2024).
BitNet a4.8 is a groundbreaking approach that enables 4-bit activations for 1-bit Large Language Models (LLMs). The method employs a hybrid quantization and sparsification strategy to mitigate quantization errors from outlier channels while maintaining model performance.
Key features:
- 4-bit quantization for attention and FFN inputs
- 8-bit quantization with sparsification for intermediate states
- Only 55% of parameters activated during inference
- Support for 3-bit KV cache
- Comparable performance to BitNet b1.58 with better inference efficiency
This implementation includes:
# Create a BitNet a4.8 model
model = create_model(
hidden_size=4096,
intermediate_size=11008,
num_hidden_layers=32,
num_attention_heads=32
)
Key components:
- RMSNorm for layer normalization
- 4-bit and 8-bit quantizers
- TopK sparsification
- BitLinear (1.58-bit weights)
- Hybrid attention mechanism
- Gated FFN with ReLU²
git clone https://github.com/yourusername/bitnet-a48
cd bitnet-a48
pip install -r requirements.txt
This implementation is part of the Agora initiative, where researchers and developers collaborate to implement cutting-edge ML papers. By joining Agora, you can:
- Collaborate with others on paper implementations
- Get early access to new research implementations
- Share your expertise and learn from others
- Contribute to open-source ML research
The implementation achieves performance comparable to BitNet b1.58 while enabling:
- 4-bit activation compression
- 45% parameter sparsity
- Reduced inference costs
- 3-bit KV cache support
from bitnet_a48 import create_model
# Initialize model
model = create_model(
hidden_size=4096,
intermediate_size=11008,
num_hidden_layers=32,
num_attention_heads=32
)
# Forward pass
outputs = model(input_ids, attention_mask)
The model uses a two-stage training recipe:
- Train with 8-bit activations and ReLU²GLU
- Fine-tune with hybrid quantization and sparsification
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
Join the discussion on the Agora Discord!
This project is licensed under the MIT License - see the LICENSE file for details.
- Original paper authors: Hongyu Wang, Shuming Ma, Furu Wei
- The Agora community
- PyTorch team
- Open-source ML community
@article{wang2024bitnet,
title={BitNet a4.8: 4-bit Activations for 1-bit LLMs},
author={Wang, Hongyu and Ma, Shuming and Wei, Furu},
journal={arXiv preprint arXiv:2411.04965},
year={2024}
}
Join us in implementing more cutting-edge ML research at Agora!