BanditSpec: Bandit-Based Speculative Decoding for Efficient Autoregressive Generation

This repository implements BanditSpec, a speculative decoding framework that adaptively balances exploration and exploitation using bandit algorithms to accelerate autoregressive generation in large language models (LLMs). The framework is compatible with both LLaMA and Qwen2 architectures.

🧠 Key Components

eagle_llama.py: Defines the Eagle (Li Y. et al. 2024 ) draft model based on LLaMA.
eagle_qwen.py: Defines the Eagle (Li Y. et al. 2024 ) draft model based on Qwen2.
llama.py, qwen.py: Customized versions of LLaMA and Qwen2 architectures.
generate_utils.py: Implements core decoding strategies including BanditSpec.
inference_length.py: Main script to run throughput benchmarking across different batch sizes and strategies.
llama_long.png: Visualization of throughput improvement comparisons.

🔧 Setup

Install Dependencies

pip install torch transformers fairscale flash-attn tqdm

⚠️ Make sure flash-attn is compiled for your CUDA and PyTorch version.

Download EAGLE models from their repo (https://github.com/SafeAILab/EAGLE)

Folder Structure

project/
├── inference_length.py
├── eagle_llama.py
├── eagle_qwen.py
├── llama.py
├── qwen.py
├── generate_utils.py
├── llama_long.png
├── llama_model/       # contains config.json and pytorch_model.bin for LLaMA
└── eagle_model/       # contains config.json and pytorch_model.bin for Eagle

Modify inference_length.py to set:

target_path = "llama_model"
eagle_path = "eagle_model"

🚀 Running BanditSpec

python inference_length.py

This will run decoding experiments across:

Different batch sizes
Various gamma values
Baselines like Best Arm, Worst Arm, and fixed gamma

📊 Output Format

bsz	spec_quota	gamma	throughput
10	256	        BanditSpec	1.43
20	256	        gamma=1	    1.61
...

Reference

Li Y, Wei F, Zhang C, et al. Eagle: Speculative sampling requires rethinking feature uncertainty[J]. arXiv preprint arXiv:2401.15077, 2024.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BanditSpec: Bandit-Based Speculative Decoding for Efficient Autoregressive Generation

🧠 Key Components

🔧 Setup

Install Dependencies

Folder Structure

🚀 Running BanditSpec

📊 Output Format

Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
eagle_llama.py		eagle_llama.py
eagle_qwen.py		eagle_qwen.py
generate_utils.py		generate_utils.py
inference_length.py		inference_length.py
llama.py		llama.py
llama_long.png		llama_long.png
qwen.py		qwen.py

sail-sg/BanditSpec

Folders and files

Latest commit

History

Repository files navigation

BanditSpec: Bandit-Based Speculative Decoding for Efficient Autoregressive Generation

🧠 Key Components

🔧 Setup

Install Dependencies

Folder Structure

🚀 Running BanditSpec

📊 Output Format

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages