Skip to content

Agora-Lab-AI/BitNet-a4.8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Join our Discord Subscribe on YouTube Connect on LinkedIn Follow on X.com

License: MIT Python 3.9+ PyTorch Join Agora

This repository contains an unofficial PyTorch implementation of BitNet a4.8: 4-bit Activations for 1-bit LLMs (Wang et al., 2024).

📑 Paper Summary

BitNet a4.8 is a groundbreaking approach that enables 4-bit activations for 1-bit Large Language Models (LLMs). The method employs a hybrid quantization and sparsification strategy to mitigate quantization errors from outlier channels while maintaining model performance.

Key features:

  • 4-bit quantization for attention and FFN inputs
  • 8-bit quantization with sparsification for intermediate states
  • Only 55% of parameters activated during inference
  • Support for 3-bit KV cache
  • Comparable performance to BitNet b1.58 with better inference efficiency

🚀 Implementation

This implementation includes:

# Create a BitNet a4.8 model
model = create_model(
    hidden_size=4096,
    intermediate_size=11008,
    num_hidden_layers=32,
    num_attention_heads=32
)

Key components:

  • RMSNorm for layer normalization
  • 4-bit and 8-bit quantizers
  • TopK sparsification
  • BitLinear (1.58-bit weights)
  • Hybrid attention mechanism
  • Gated FFN with ReLU²

📦 Installation

git clone https://github.com/yourusername/bitnet-a48
cd bitnet-a48
pip install -r requirements.txt

🤝 Join the Agora Community

This implementation is part of the Agora initiative, where researchers and developers collaborate to implement cutting-edge ML papers. By joining Agora, you can:

  • Collaborate with others on paper implementations
  • Get early access to new research implementations
  • Share your expertise and learn from others
  • Contribute to open-source ML research

Join Agora Today

📊 Results

The implementation achieves performance comparable to BitNet b1.58 while enabling:

  • 4-bit activation compression
  • 45% parameter sparsity
  • Reduced inference costs
  • 3-bit KV cache support

🛠️ Usage

from bitnet_a48 import create_model

# Initialize model
model = create_model(
    hidden_size=4096,
    intermediate_size=11008,
    num_hidden_layers=32,
    num_attention_heads=32
)

# Forward pass
outputs = model(input_ids, attention_mask)

📈 Training

The model uses a two-stage training recipe:

  1. Train with 8-bit activations and ReLU²GLU
  2. Fine-tune with hybrid quantization and sparsification

🤝 Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

Join the discussion on the Agora Discord!

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

  • Original paper authors: Hongyu Wang, Shuming Ma, Furu Wei
  • The Agora community
  • PyTorch team
  • Open-source ML community

📚 Citation

@article{wang2024bitnet,
  title={BitNet a4.8: 4-bit Activations for 1-bit LLMs},
  author={Wang, Hongyu and Ma, Shuming and Wei, Furu},
  journal={arXiv preprint arXiv:2411.04965},
  year={2024}
}

🔗 Links

Join us in implementing more cutting-edge ML research at Agora!

About

BitNet a4.8 Implementation in one file of pytorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages