Skip to content

v2.1.1

Compare
Choose a tag to compare
@Narsil Narsil released this 04 Jul 10:43
4dfdb48

Main changes

  • Bugfixes
  • Added FlashDecoding support (Beta) use FLASH_DECODING=1 to use TGI with flash decoding (large speedups on long queries). #1940
  • Use Marlin over GPTQ kernels for faster GPTQ inference #2111

What's Changed

  • Fixing the CI to also run in release when it's a tag ? by @Narsil in #2138
  • fix microsoft/Phi-3-mini-4k-instruct crash in batch.slots[batch.slot_… by @sywangyi in https://github.com//pull/2148
  • Fixing clippy. by @Narsil in #2149
  • fix: use weights from base_layer by @drbh in #2141
  • feat: download lora adapter weights from launcher by @drbh in #2140
  • Use GPTQ-Marlin for supported GPTQ configurations by @danieldk in #2111
  • fix AttributeError: 'MixtralLayer' object has no attribute 'mlp' by @icyxp in #2123
  • refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform by @sywangyi in #2132
  • fix: prefer serde structs over custom functions by @drbh in #2127
  • Fixing test. by @Narsil in #2152
  • GH router. by @Narsil in #2153
  • Fixing baichuan override. by @Narsil in #2158
  • [Major Change][Undecided yet] Move to FlashDecoding instead of PagedAttention kernel. by @Narsil in #1940
  • Fixing graph capture for flash decoding. by @Narsil in #2163
  • fix FlashDecoding change's regression in intel platform by @sywangyi in #2161
  • fix: use the base layers weight in mistral rocm by @drbh in #2155
  • Fixing rocm. by @Narsil in #2164
  • Ci test by @glegendre01 in #2124
  • Hotfixing qwen2 and starcoder2 (which also get clamping). by @Narsil in #2167
  • feat: improve update_docs for openapi schema by @drbh in #2169
  • Fixing the dockerfile warnings. by @Narsil in #2173
  • Fixing missing object field for regular completions. by @Narsil in #2175

New Contributors

Full Changelog: v2.1.0...v2.1.1