Skip to content

v0.2.0

Choose a tag to compare

@vMaroon vMaroon released this 19 Jul 21:54
· 67 commits to main since this release
8a60b22

What's Changed

  • Introduced vLLM-Native KV-Events processing and new indexing backends
    • In-Memory index (default): KV-Events are digested and stored in memory
    • Redis index
  • Added observability and real-time Prometheus metrics
    • Tracks KV-Block admissions, evictions, lookups and hit-rates
  • Enhanced configurability
  • Updated integration in llm-d-inference-scheduler (accurate prefix-cache aware scorer)
  • Initial support for OpenAI-compatible Chat Completions templating (library)
  • Enhanced user examples and end-to-end (vLLM <-> indexer) deployment setup
  • General documentation improvements

PRs

  • (chore): typo in tokenizer file by @buraksekili in #39
  • [KV-Events] Introduce KV-Block Indexing Backends - Part 1 of 3 by @vMaroon in #40
  • fix: replace llm-d tag to 0.0.8 by @kfirtoledo in #42
  • docs: Add a setup documentation about examples/kv-cache-index by @buraksekili in #38
  • [KV-Events] KV-Events Processing - Part 3 of 3 by @vMaroon in #44
  • Matched Default TokenProcessorConfig.BlockSize with vLLM's by @vMaroon in #52
  • [KVBlock.Index] Prometheus Metrics & Logging by @vMaroon in #53
  • Enhance Configurability by @vMaroon in #55
  • Update configuration.md by @vMaroon in #56
  • Implement Metrics Logging Configuration in Indexer by @vMaroon in #57
  • Completions-Support (#50) Extension by @guygir in #58

New Contributors

Full Changelog: v0.1.1...v0.2.0-RC1