Skip to content

v2.4.1

Latest
Compare
Choose a tag to compare
@OlivierDehaene OlivierDehaene released this 22 Nov 17:35
d2ed52f

Notable changes

  • Choose input/total tokens automatically based on available VRAM
  • Support Qwen2 VL
  • Decrease latency of very large batches (> 128)

What's Changed

New Contributors

Full Changelog: v2.3.0...v2.4.1