v0.2.0
What's Changed
- Introduced vLLM-Native KV-Events processing and new indexing backends
- In-Memory index (default): KV-Events are digested and stored in memory
- Redis index
- Added observability and real-time Prometheus metrics
- Tracks KV-Block admissions, evictions, lookups and hit-rates
- Enhanced configurability
- Updated integration in llm-d-inference-scheduler (accurate prefix-cache aware scorer)
- Initial support for OpenAI-compatible Chat Completions templating (library)
- Enhanced user examples and end-to-end (vLLM <-> indexer) deployment setup
- General documentation improvements
PRs
- (chore): typo in tokenizer file by @buraksekili in #39
- [KV-Events] Introduce KV-Block Indexing Backends - Part 1 of 3 by @vMaroon in #40
- fix: replace llm-d tag to 0.0.8 by @kfirtoledo in #42
- docs: Add a setup documentation about examples/kv-cache-index by @buraksekili in #38
- [KV-Events] KV-Events Processing - Part 3 of 3 by @vMaroon in #44
- Matched Default TokenProcessorConfig.BlockSize with vLLM's by @vMaroon in #52
- [KVBlock.Index] Prometheus Metrics & Logging by @vMaroon in #53
- Enhance Configurability by @vMaroon in #55
- Update configuration.md by @vMaroon in #56
- Implement Metrics Logging Configuration in Indexer by @vMaroon in #57
- Completions-Support (#50) Extension by @guygir in #58
New Contributors
- @buraksekili made their first contribution in #39
- @guygir made their first contribution in #58
Full Changelog: v0.1.1...v0.2.0-RC1