Skip to content

v0.9.1

Compare
Choose a tag to compare
@OlivierDehaene OlivierDehaene released this 06 Jul 14:09
· 854 commits to main since this release
31b36cc

Highlights

  • server: Non flash MPT
  • server: decrease memory fragmentation

Features

  • server: use latest flash attention
  • router: add argument for hostname in router
  • docs: Adding some help for the options in text-generation-benchmark

Fix

  • makefile: Update server/Makefile to include Makefile-vllm
  • server: Handle loading from local files for MPT
  • server: avoid errors for very small top_p values

Full Changelog: v0.9.0...v0.9.1