v0.3.0 roadmap #698

Jeffwan · 2025-02-18T06:22:51Z

🚀 Feature Description and Motivation

I create this issue to track the v0.3.0 items we like to work on. We actually have a milestone https://github.com/aibrix/aibrix/milestone/9 to track all issues but that's too many issues that user who does not work on this project might feel overwhelmed.

Let's create a list that user are interested in.

Improve the kv cache reuse through the routing
- [RFC] Support KV-cache reuse within same multi-turn conversation window #99
Lora production use case support
- Support different lora adapter artifact registry #49
Model centric deployment
- [RFC] Add model API as the umbrella for model centric deployment #302
Stability Improvements for production use cases
Public Cloud Support

Use Case

Track the v0.3.0 release items

Proposed Solution

No response

kerthcet · 2025-02-18T07:34:27Z

I'm wondering whether we can deliver a stable version at some time, stable here means a workable state, less bugs, relatively complete documentation, good test coverages. We can make it a baseline and append more features on it with feature gates or flags to enable/disable it. I have this question is just because I saw we have a lot of inspiring features to be merged, have no idea what's the plan to evolve with them in the long term.

Jeffwan · 2025-02-18T17:36:35Z

@kerthcet I agree that after v0.2.0, we will have a solid baseline of features, and ensuring production-grade quality should be our top priority. We can discuss this further and align on the next steps as you suggested. The future roadmap should balance new feature development with production readiness to maintain stability while continuing to evolve. We have some internal adoptions as well, we will try to surface those bugs or tricks at the same time.

gaocegege · 2025-02-20T00:51:17Z

Are you planning to support [Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes to simplify the deployment of multi-node inference? I recently discussed this with youkaichao@, and he believes it could be possible by implementing a new executor.

Some references: vllm-project/vllm#11400

kerthcet · 2025-02-21T07:09:41Z

According to the offline talk with @Jeffwan before, I think aibrix leverages ray for fine-gained orchestration, like multi host serving, pd disaggregated serving, so maybe not a plane in the long term? Need @Jeffwan 's confirm. But definitely possible for vllm project.

robertgshaw2-redhat · 2025-02-21T17:29:43Z

Congrats on the launch guys!

Electronic-Waste · 2025-02-23T06:52:47Z

@Jeffwan Awesome! Congrats on the open-source!

Jeffwan · 2025-02-24T05:42:39Z

@gaocegege @kerthcet

We do see lots of users do not like ray in distributed serving due to the its overhead and debug-ability. Supporting cloud native way to run vLLM in multi-nodes would be beneficial. I think options should be given to users. We created vllm-project/vllm#3902 earlier but didn't get chance to works on it, if people likes it and there's no one working on it yet, we will spend some efforts and also change to orchestration layer.

BTW, P&D case orchestration will introduce the application router or local cluster scheduler (CLS in splitwise paper), it's not exact same as current multi-node way, if the paradigm can be finalized, the cloud native way sounds like a plan. If not, I think it still a potential problem because everytime the paradigm is changed, cloud native way need additional change.

gaocegege · 2025-02-24T05:48:09Z

BTW, P&D case orchestration will introduce the application router or local cluster scheduler (CLS in splitwise paper), it's not exact same as current multi-node way, if the paradigm can be finalized, the cloud native way sounds like a plan. If not, I think it still a potential problem because everytime the paradigm is changed, cloud native way need additional change.

We had some discussions in production stack about it too. vllm-project/production-stack#7 (comment) .

/cc @KuntaiDu

Jeffwan pinned this issue Feb 18, 2025

Jeffwan added kind/enhancement New feature or request priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 18, 2025

Jeffwan added this to the v0.3.0 milestone Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0 roadmap #698

v0.3.0 roadmap #698

Jeffwan commented Feb 18, 2025 •

edited

Loading

kerthcet commented Feb 18, 2025

Jeffwan commented Feb 18, 2025 •

edited

Loading

gaocegege commented Feb 20, 2025

kerthcet commented Feb 21, 2025

robertgshaw2-redhat commented Feb 21, 2025

Electronic-Waste commented Feb 23, 2025

Jeffwan commented Feb 24, 2025

gaocegege commented Feb 24, 2025

v0.3.0 roadmap #698

v0.3.0 roadmap #698

Comments

Jeffwan commented Feb 18, 2025 • edited Loading

🚀 Feature Description and Motivation

Use Case

Proposed Solution

kerthcet commented Feb 18, 2025

Jeffwan commented Feb 18, 2025 • edited Loading

gaocegege commented Feb 20, 2025

kerthcet commented Feb 21, 2025

robertgshaw2-redhat commented Feb 21, 2025

Electronic-Waste commented Feb 23, 2025

Jeffwan commented Feb 24, 2025

gaocegege commented Feb 24, 2025

Jeffwan commented Feb 18, 2025 •

edited

Loading

Jeffwan commented Feb 18, 2025 •

edited

Loading