Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[router] Document supported APIs #732

Open
gaocegege opened this issue Feb 23, 2025 · 6 comments
Open

[router] Document supported APIs #732

gaocegege opened this issue Feb 23, 2025 · 6 comments
Labels
area/website kind/documentation Improvements or additions to documentation
Milestone

Comments

@gaocegege
Copy link
Collaborator

🚀 Feature Description and Motivation

We should document the supported APIs. Besides this, I wanna ask if embedding APIs are supported.

Use Case

N/A

Proposed Solution

No response

@gaocegege gaocegege added kind/documentation Improvements or additions to documentation area/website labels Feb 23, 2025
@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 24, 2025

Do you mean generation/embedding/tokenization apis supported in vLLM (https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/openai)? The current gateway design is more like a proxy instead of an additional API layer. Technically, it supports any protocol engine supports. The gateway plugin only validate model existence based on the registration information.

Currently, gateway configuration doesn't set any restriction yet. In future, for stability consideration, this might be changed
https://github.com/vllm-project/aibrix/blob/6feec99d77c84e371da9c535054c2b8aa8912704/config/gateway/gateway.yaml

I agree that embedding or other API compatibility should be documented.

@Jeffwan Jeffwan added this to the v0.3.0 milestone Feb 24, 2025
@gaocegege
Copy link
Collaborator Author

Do you mean generation/embedding/tokenization apis supported in vLLM (https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/openai)?

Yes. Since vLLM doesn’t support the batching API, it makes sense that aibrix shouldn’t mark it as supported either. As a user, I’m just curious to see from the docs which APIs are actually supported—it’d be super helpful to have that clarity! 😊

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 24, 2025

Got your point, that totally makes sense. I think it should support something similar to Kubernetes Extension API services. The Batch API is a good example—currently, there doesn't seem to be a standardized engine for it. If users implement it in a third-party manner, we should aggregate it at the gateway layer while allowing different services/components to provide it.

@gaocegege
Copy link
Collaborator Author

I’m currently working on implementing the batch API with support for object storage and local files in our production stack’s router. I’m not entirely sure yet, but I’m wondering if it’s possible to integrate this production stack router as a component of a vLLM deployment. If so, the gateway could potentially aggregate and collaborate with the production stack router to make this functionality work.

Gateway -> Router deployment -> vLLM deployment

Adding the router might introduce a bit of latency—somewhere around 1 to 10 milliseconds. But honestly, I think it’s kind of unavoidable if we’re planning to implement batching outside of vLLM. It’s just one of those trade-offs we’ll have to consider.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 24, 2025

@gaocegege I see. Technically I think it's possible. P&D case requires such router as well.

At the same time, AIBrix has a batch RFC #182 as well but due to limited resources, we have not made enough progress. Comparing to implement the routing & batch api layer together in router. I am thinking in AIBrix,

  1. can we have an extended server just provide the batch API service and request orchestration service (congrestion control, backpressure etc) and object management, it plays as the client and send request to backend vLLM service.
  2. Gateway part can added necessary routing strategy support for batch requests. (It also depends on how to implement batch)

In this case, the flow would be

Gateway (Batch Async API)-> Batch API Service -> Gateway (mostly Sync API) -> vLLM deployment.

I think this is an alternative way

@gaocegege
Copy link
Collaborator Author

gaocegege commented Feb 24, 2025

The Batch API needs user management to support the List Batches, which means the gateway needs to access a metadata database.

I’m a bit unsure if it’s ideal for the gateway to handle business logic, but overall, LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/website kind/documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants