Prefix cache and load aware routing policy #677
Labels
area/gateway
area/kv-cache
area/performance
area/scheduling
kind/enhancement
New feature or request
kind/feature
Categorizes issue or PR as related to a new feature.
Milestone
🚀 Feature Description and Motivation
Currently, AiBrix is supporting a simple prefix-aware routing. From data structure perspective, it is using hash table with fixed size of a block. A block represents a certain number of consecutive tokens.
More sophisticated prefix aware routing would be useful, for example, Preble, SGLang, D^2LPM.
As the initial prototype implementation, I am planning to prototype Preble scheduling in AiBrix quickly with best effort.
Work items
aibrix/pkg/plugins/gateway/prefixcacheindexer
)aibrix/pkg/plugins/gateway/algorithms
)One flaky thing about Preble scheduling logic is that it requires some magic numbers.
They use linear regression and the coefficient and intercept are hardcoded in Preble code.
Use Case
Better routing for better performance
Proposed Solution
No response
The text was updated successfully, but these errors were encountered: