v0.9.1
Highlights
- server: Non flash MPT
- server: decrease memory fragmentation
Features
- server: use latest flash attention
- router: add argument for hostname in router
- docs: Adding some help for the options in
text-generation-benchmark
Fix
- makefile: Update server/Makefile to include Makefile-vllm
- server: Handle loading from local files for MPT
- server: avoid errors for very small top_p values
Full Changelog: v0.9.0...v0.9.1