Dynamic Routing for vLLM #800

terrykong started this conversation in Ideas

terrykong
Mar 31, 2025
Maintainer

We need a router sidecar that queries the telemetry from vllm and determine which vLLM instance has the least load and send more prompts there.

Replies: 2 comments

terrykong
Mar 31, 2025
Maintainer Author

Example router from dynamo: https://github.com/ai-dynamo/dynamo/blob/main/examples/llm/components/kv_router.py

0 replies

dchichkov
Apr 1, 2025

There's this repo - https://github.com/VectorInstitute/vector-inference

This repository provides an easy-to-use solution to run inference servers on Slurm-managed computing clusters using vLLM.

Note also:
https://docs.vllm.ai/projects/production-stack/en/latest/user_manual/router/cmd.html

Note, slurm autoscale, with an option to backfill slurm capacity is the next obvious feature.

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment