Replies: 2 comments
-
Example router from dynamo: https://github.com/ai-dynamo/dynamo/blob/main/examples/llm/components/kv_router.py |
Beta Was this translation helpful? Give feedback.
0 replies
-
There's this repo - https://github.com/VectorInstitute/vector-inference
Note also: Note, slurm autoscale, with an option to backfill slurm capacity is the next obvious feature. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We need a router sidecar that queries the telemetry from vllm and determine which vLLM instance has the least load and send more prompts there.
Beta Was this translation helpful? Give feedback.
All reactions