-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[router] Document supported APIs #732
Comments
Do you mean generation/embedding/tokenization apis supported in vLLM (https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/openai)? The current gateway design is more like a proxy instead of an additional API layer. Technically, it supports any protocol engine supports. The gateway plugin only validate model existence based on the registration information. Currently, gateway configuration doesn't set any restriction yet. In future, for stability consideration, this might be changed I agree that embedding or other API compatibility should be documented. |
Yes. Since vLLM doesn’t support the batching API, it makes sense that aibrix shouldn’t mark it as supported either. As a user, I’m just curious to see from the docs which APIs are actually supported—it’d be super helpful to have that clarity! 😊 |
Got your point, that totally makes sense. I think it should support something similar to Kubernetes Extension API services. The Batch API is a good example—currently, there doesn't seem to be a standardized engine for it. If users implement it in a third-party manner, we should aggregate it at the gateway layer while allowing different services/components to provide it. |
I’m currently working on implementing the batch API with support for object storage and local files in our production stack’s router. I’m not entirely sure yet, but I’m wondering if it’s possible to integrate this production stack router as a component of a vLLM deployment. If so, the gateway could potentially aggregate and collaborate with the production stack router to make this functionality work.
Adding the router might introduce a bit of latency—somewhere around 1 to 10 milliseconds. But honestly, I think it’s kind of unavoidable if we’re planning to implement batching outside of vLLM. It’s just one of those trade-offs we’ll have to consider. |
@gaocegege I see. Technically I think it's possible. P&D case requires such router as well. At the same time, AIBrix has a batch RFC #182 as well but due to limited resources, we have not made enough progress. Comparing to implement the routing & batch api layer together in router. I am thinking in AIBrix,
In this case, the flow would be
I think this is an alternative way |
The Batch API needs user management to support the List Batches, which means the gateway needs to access a metadata database. I’m a bit unsure if it’s ideal for the gateway to handle business logic, but overall, LGTM |
🚀 Feature Description and Motivation
We should document the supported APIs. Besides this, I wanna ask if embedding APIs are supported.
Use Case
N/A
Proposed Solution
No response
The text was updated successfully, but these errors were encountered: