BitNet a4.8: 4-bit Activations for 1-bit LLMs

This repository contains an unofficial PyTorch implementation of BitNet a4.8: 4-bit Activations for 1-bit LLMs (Wang et al., 2024).

📑 Paper Summary

BitNet a4.8 is a groundbreaking approach that enables 4-bit activations for 1-bit Large Language Models (LLMs). The method employs a hybrid quantization and sparsification strategy to mitigate quantization errors from outlier channels while maintaining model performance.

Key features:

4-bit quantization for attention and FFN inputs
8-bit quantization with sparsification for intermediate states
Only 55% of parameters activated during inference
Support for 3-bit KV cache
Comparable performance to BitNet b1.58 with better inference efficiency

🚀 Implementation

This implementation includes:

# Create a BitNet a4.8 model
model = create_model(
    hidden_size=4096,
    intermediate_size=11008,
    num_hidden_layers=32,
    num_attention_heads=32
)

Key components:

RMSNorm for layer normalization
4-bit and 8-bit quantizers
TopK sparsification
BitLinear (1.58-bit weights)
Hybrid attention mechanism
Gated FFN with ReLU²

📦 Installation

git clone https://github.com/yourusername/bitnet-a48
cd bitnet-a48
pip install -r requirements.txt

🤝 Join the Agora Community

This implementation is part of the Agora initiative, where researchers and developers collaborate to implement cutting-edge ML papers. By joining Agora, you can:

Collaborate with others on paper implementations
Get early access to new research implementations
Share your expertise and learn from others
Contribute to open-source ML research

Join Agora Today

📊 Results

The implementation achieves performance comparable to BitNet b1.58 while enabling:

4-bit activation compression
45% parameter sparsity
Reduced inference costs
3-bit KV cache support

🛠️ Usage

from bitnet_a48 import create_model

# Initialize model
model = create_model(
    hidden_size=4096,
    intermediate_size=11008,
    num_hidden_layers=32,
    num_attention_heads=32
)

# Forward pass
outputs = model(input_ids, attention_mask)

📈 Training

The model uses a two-stage training recipe:

Train with 8-bit activations and ReLU²GLU
Fine-tune with hybrid quantization and sparsification

🤝 Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch
Submit a pull request

Join the discussion on the Agora Discord!

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

Original paper authors: Hongyu Wang, Shuming Ma, Furu Wei
The Agora community
PyTorch team
Open-source ML community

📚 Citation

@article{wang2024bitnet,
  title={BitNet a4.8: 4-bit Activations for 1-bit LLMs},
  author={Wang, Hongyu and Ma, Shuming and Wei, Furu},
  journal={arXiv preprint arXiv:2411.04965},
  year={2024}
}

🔗 Links

Join us in implementing more cutting-edge ML research at Agora!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agents.yaml		agents.yaml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BitNet a4.8: 4-bit Activations for 1-bit LLMs

📑 Paper Summary

🚀 Implementation

📦 Installation

🤝 Join the Agora Community

📊 Results

🛠️ Usage

📈 Training

🤝 Contributing

📜 License

🙏 Acknowledgements

📚 Citation

🔗 Links

About

Releases

Sponsor this project

Packages

Languages

License

Agora-Lab-AI/BitNet-a4.8

Folders and files

Latest commit

History

Repository files navigation

BitNet a4.8: 4-bit Activations for 1-bit LLMs

📑 Paper Summary

🚀 Implementation

📦 Installation

🤝 Join the Agora Community

📊 Results

🛠️ Usage

📈 Training

🤝 Contributing

📜 License

🙏 Acknowledgements

📚 Citation

🔗 Links

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages