[router] Document supported APIs #732

gaocegege · 2025-02-23T01:14:28Z

🚀 Feature Description and Motivation

We should document the supported APIs. Besides this, I wanna ask if embedding APIs are supported.

Use Case

N/A

Proposed Solution

No response

Jeffwan · 2025-02-24T05:34:00Z

Do you mean generation/embedding/tokenization apis supported in vLLM (https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/openai)? The current gateway design is more like a proxy instead of an additional API layer. Technically, it supports any protocol engine supports. The gateway plugin only validate model existence based on the registration information.

Currently, gateway configuration doesn't set any restriction yet. In future, for stability consideration, this might be changed
https://github.com/vllm-project/aibrix/blob/6feec99d77c84e371da9c535054c2b8aa8912704/config/gateway/gateway.yaml

I agree that embedding or other API compatibility should be documented.

gaocegege · 2025-02-24T05:37:31Z

Do you mean generation/embedding/tokenization apis supported in vLLM (https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/openai)?

Yes. Since vLLM doesn’t support the batching API, it makes sense that aibrix shouldn’t mark it as supported either. As a user, I’m just curious to see from the docs which APIs are actually supported—it’d be super helpful to have that clarity! 😊

Jeffwan · 2025-02-24T05:51:10Z

Got your point, that totally makes sense. I think it should support something similar to Kubernetes Extension API services. The Batch API is a good example—currently, there doesn't seem to be a standardized engine for it. If users implement it in a third-party manner, we should aggregate it at the gateway layer while allowing different services/components to provide it.

gaocegege · 2025-02-24T06:09:34Z

I’m currently working on implementing the batch API with support for object storage and local files in our production stack’s router. I’m not entirely sure yet, but I’m wondering if it’s possible to integrate this production stack router as a component of a vLLM deployment. If so, the gateway could potentially aggregate and collaborate with the production stack router to make this functionality work.

Gateway -> Router deployment -> vLLM deployment

Adding the router might introduce a bit of latency—somewhere around 1 to 10 milliseconds. But honestly, I think it’s kind of unavoidable if we’re planning to implement batching outside of vLLM. It’s just one of those trade-offs we’ll have to consider.

Jeffwan · 2025-02-24T06:20:09Z

@gaocegege I see. Technically I think it's possible. P&D case requires such router as well.

At the same time, AIBrix has a batch RFC #182 as well but due to limited resources, we have not made enough progress. Comparing to implement the routing & batch api layer together in router. I am thinking in AIBrix,

can we have an extended server just provide the batch API service and request orchestration service (congrestion control, backpressure etc) and object management, it plays as the client and send request to backend vLLM service.
Gateway part can added necessary routing strategy support for batch requests. (It also depends on how to implement batch)

In this case, the flow would be

Gateway (Batch Async API)-> Batch API Service -> Gateway (mostly Sync API) -> vLLM deployment.

I think this is an alternative way

gaocegege · 2025-02-24T06:30:16Z

The Batch API needs user management to support the List Batches, which means the gateway needs to access a metadata database.

I’m a bit unsure if it’s ideal for the gateway to handle business logic, but overall, LGTM

gaocegege added kind/documentation Improvements or additions to documentation area/website labels Feb 23, 2025

Jeffwan added this to the v0.3.0 milestone Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[router] Document supported APIs #732

[router] Document supported APIs #732

gaocegege commented Feb 23, 2025

Jeffwan commented Feb 24, 2025 •

edited

Loading

gaocegege commented Feb 24, 2025

Jeffwan commented Feb 24, 2025

gaocegege commented Feb 24, 2025

Jeffwan commented Feb 24, 2025

gaocegege commented Feb 24, 2025 •

edited

Loading

[router] Document supported APIs #732

[router] Document supported APIs #732

Comments

gaocegege commented Feb 23, 2025

🚀 Feature Description and Motivation

Use Case

Proposed Solution

Jeffwan commented Feb 24, 2025 • edited Loading

gaocegege commented Feb 24, 2025

Jeffwan commented Feb 24, 2025

gaocegege commented Feb 24, 2025

Jeffwan commented Feb 24, 2025

gaocegege commented Feb 24, 2025 • edited Loading

Jeffwan commented Feb 24, 2025 •

edited

Loading

gaocegege commented Feb 24, 2025 •

edited

Loading