[router] Use string instead of token ids #673
Labels
area/gateway
kind/enhancement
New feature or request
priority/critical-urgent
Highest priority. Must be actively worked on as someone's top priority right now.
Ref #641 (comment)
Currently, we use token IDs to support prefix cache-aware routing, which requires encoding first. This introduces several microseconds of latency to the requests, and the benefits aren't substantial.
In Q&A scenarios, the input string could be equivalent to token IDs. For RAG scenarios, we might benefit from something like CacheBlend for better performance rather than solely relying on token IDs.
Therefore, I propose using strings in the router.
The text was updated successfully, but these errors were encountered: