v0.2.0

vMaroon released this 19 Jul 21:54

· 67 commits to main since this release

8a60b22

What's Changed

Introduced vLLM-Native KV-Events processing and new indexing backends
- In-Memory index (default): KV-Events are digested and stored in memory
- Redis index
Added observability and real-time Prometheus metrics
- Tracks KV-Block admissions, evictions, lookups and hit-rates
Enhanced configurability
Updated integration in llm-d-inference-scheduler (accurate prefix-cache aware scorer)
Initial support for OpenAI-compatible Chat Completions templating (library)
Enhanced user examples and end-to-end (vLLM <-> indexer) deployment setup
General documentation improvements

PRs

(chore): typo in tokenizer file by @buraksekili in #39
[KV-Events] Introduce KV-Block Indexing Backends - Part 1 of 3 by @vMaroon in #40
fix: replace llm-d tag to 0.0.8 by @kfirtoledo in #42
docs: Add a setup documentation about examples/kv-cache-index by @buraksekili in #38
[KV-Events] KV-Events Processing - Part 3 of 3 by @vMaroon in #44
Matched Default TokenProcessorConfig.BlockSize with vLLM's by @vMaroon in #52
[KVBlock.Index] Prometheus Metrics & Logging by @vMaroon in #53
Enhance Configurability by @vMaroon in #55
Update configuration.md by @vMaroon in #56
Implement Metrics Logging Configuration in Indexer by @vMaroon in #57
Completions-Support (#50) Extension by @guygir in #58

New Contributors

@buraksekili made their first contribution in #39
@guygir made their first contribution in #58

Full Changelog: v0.1.1...v0.2.0-RC1

Contributors

kfirtoledo, buraksekili, and 2 other contributors

Assets 2