Auto-Tune vLLM

A hyperparameter optimization framework for vLLM serving with local and optional Ray execution backends, built with Optuna.

Note: This is a maintained fork

This repository is a fork of the openshift-psap/auto-tuning-vllm project. We are grateful to the original authors for providing the foundation this fork builds upon. This fork was created to address specific needs in our environment that differ from the original project's scope.

Why this fork?

This fork was created to adapt the original framework to our specific requirements:

Simpler deployment for single-node scenarios - Ray is optional, not required
Testing infrastructure to support safe evolution of the codebase
Active maintenance for our production use cases (dependency updates, bug fixes)
Feature expansion as needed for our workloads (additional inference engines, benchmark tools)

Features

🎯 Flexible Backends: Run locally (default) or optionally on Ray clusters
📊 Benchmarking: Built-in GuideLLM support
🗄️ Flexible Storage: SQLite for local use, PostgreSQL for production (optional)
⚙️ Easy Configuration: YAML-based study and parameter configuration
📈 Multi-Objective: Support for throughput vs latency trade-offs

Quick Start (5 minutes)

For a detailed starter guide, see the Quick Start Guide.

Installation

Install the base package for local execution. Add the optional ray extra only if you want distributed execution.

# Clone the maintained fork
git clone https://github.com/InseeFrLab/auto-tuning-vllm.git
cd auto-tuning-vllm

# Basic installation (local execution only)
pip install -e .

# Optional: Install with Ray support for distributed execution
pip install -e ".[ray]"

# Optional: Install with PostgreSQL support
pip install -e ".[postgresql]"

Basic Usage

# Run optimization study locally (default backend)
auto-tune-vllm optimize --config config.yaml --max-concurrent-trials 2

# Run optimization study on Ray
auto-tune-vllm optimize --config config.yaml --backend ray --venv-path ./venv --max-concurrent-trials 2

# Resume interrupted study
auto-tune-vllm resume --study-name study_35884

# Stream live logs
auto-tune-vllm logs --study-name study_35884

Documentation

Quick Start Guide - Get running in 5 minutes
Configuration Reference - Complete YAML configuration guide
Ray Cluster Setup - For distributed optimization (optional)

Requirements

Python 3.10+
NVIDIA GPU with CUDA support (for running vLLM)
SQLite (included) or PostgreSQL (optional)

Core dependencies are installed with pip install -e .. Ray is optional and available via pip install -e ".[ray]".

Roadmap

This fork is actively being improved. Current work in progress:

Immediate priorities

Add comprehensive test suite
Expand CI/CD to run tests, not just linting
Dependency hygiene - pin versions, reduce heavy core dependencies
Improve CLI error messages and validation

Future work

Support for speculative decoding parameters
Additional benchmark providers beyond GuideLLM
Support for alternative inference engines (e.g., SGLang)
Better parameter validation against vLLM CLI args

Contributing

This fork welcomes contributions. Priority areas:

Testing - Adding tests for existing functionality
Documentation - Improving guides and examples
Core stability - Bug fixes and edge case handling

License

Apache License 2.0 - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
auto_tune_vllm		auto_tune_vllm
docs		docs
examples		examples
optuna_dashboard		optuna_dashboard
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_installation.py		test_installation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto-Tune vLLM

Why this fork?

Features

Quick Start (5 minutes)

Installation

Basic Usage

Documentation

Requirements

Roadmap

Immediate priorities

Future work

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Auto-Tune vLLM

Why this fork?

Features

Quick Start (5 minutes)

Installation

Basic Usage

Documentation

Requirements

Roadmap

Immediate priorities

Future work

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages