Skip to content

v2.1.0

Compare
Choose a tag to compare
@Narsil Narsil released this 28 Jun 06:26
192d49a

Notable changes

  • New models : gemma2

  • Multi lora adapters. You can now run multiple loras on the same TGI deployment #2010

  • Faster GPTQ inference and Marlin support (up to 2x speedup).

  • Reworked the entire scheduling logic (better block allocations, and allowing further speedups in new releases)

  • Lots of Rocm support and bugfixes,

  • Lots of new contributors ! Thanks a lot for these contributions

What's Changed

New Contributors

Full Changelog: v2.0.3...v2.1.0