Skip to content

v2.4.0

Compare
Choose a tag to compare
@OlivierDehaene OlivierDehaene released this 25 Oct 21:14
0a655a0

Notable changes

  • Experimental prefill chunking (PREFILL_CHUNKING=1)
  • Experimental FP8 KV cache support
  • Greatly decrease latency for large batches (> 128 requests)
  • Faster MoE kernels and support for GPTQ-quantized MoE
  • Faster implementation of MLLama

What's Changed

New Contributors

Full Changelog: v2.3.0...v2.4