-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.3.0 roadmap #698
Comments
I'm wondering whether we can deliver a stable version at some time, stable here means a workable state, less bugs, relatively complete documentation, good test coverages. We can make it a baseline and append more features on it with feature gates or flags to enable/disable it. I have this question is just because I saw we have a lot of inspiring features to be merged, have no idea what's the plan to evolve with them in the long term. |
@kerthcet I agree that after v0.2.0, we will have a solid baseline of features, and ensuring production-grade quality should be our top priority. We can discuss this further and align on the next steps as you suggested. The future roadmap should balance new feature development with production readiness to maintain stability while continuing to evolve. We have some internal adoptions as well, we will try to surface those bugs or tricks at the same time. |
Are you planning to support [Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes to simplify the deployment of multi-node inference? I recently discussed this with youkaichao@, and he believes it could be possible by implementing a new executor. Some references: vllm-project/vllm#11400 |
Congrats on the launch guys! |
@Jeffwan Awesome! Congrats on the open-source! |
We do see lots of users do not like ray in distributed serving due to the its overhead and debug-ability. Supporting cloud native way to run vLLM in multi-nodes would be beneficial. I think options should be given to users. We created vllm-project/vllm#3902 earlier but didn't get chance to works on it, if people likes it and there's no one working on it yet, we will spend some efforts and also change to orchestration layer. BTW, P&D case orchestration will introduce the application router or local cluster scheduler (CLS in splitwise paper), it's not exact same as current multi-node way, if the paradigm can be finalized, the cloud native way sounds like a plan. If not, I think it still a potential problem because everytime the paradigm is changed, cloud native way need additional change. |
We had some discussions in production stack about it too. vllm-project/production-stack#7 (comment) . /cc @KuntaiDu |
🚀 Feature Description and Motivation
I create this issue to track the v0.3.0 items we like to work on. We actually have a milestone https://github.com/aibrix/aibrix/milestone/9 to track all issues but that's too many issues that user who does not work on this project might feel overwhelmed.
Let's create a list that user are interested in.
Use Case
Track the v0.3.0 release items
Proposed Solution
No response
The text was updated successfully, but these errors were encountered: