v2.0.2
Tl;dr
- New models (idefics2, phi3)
- Cleaner VLM support in the openai layer
- Upgraded to pytorch 2.3.0
What's Changed
- Make
--cuda-graphs 0
work as expected (bis) by @fxmarty in #1768 - fix typos in docs and add small clarifications by @MoritzLaurer in #1790
- Add attribute descriptions for
GenerateParameters
by @Wauplin in #1798 - feat: allow null eos and bos tokens in config by @drbh in #1791
- Phi3 support by @Narsil in #1797
- Idefics2. by @Narsil in #1756
- fix: avoid frequency and repetition penalty on padding tokens by @drbh in #1765
- Adding support for
HF_HUB_OFFLINE
support in the router. by @Narsil in #1789 - feat: improve temperature logic in chat by @drbh in #1749
- Updating the benchmarks so everyone uses openai compat layer. by @Narsil in #1800
- Update guidance docs to reflect grammar support in API by @dr3s in #1775
- Use the generation config. by @Narsil in #1808
- 2nd round of benchmark modifications (tiny adjustements to avoid overloading the host). by @Narsil in #1816
- Adding new env variables for TPU backends. by @Narsil in #1755
- add intel xpu support for TGI by @sywangyi in #1475
- Blunder by @Narsil in #1815
- Fixing qwen2. by @Narsil in #1818
- Dummy CI run. by @Narsil in #1817
- Changing the waiting_served_ratio default (stack more aggressively by default). by @Narsil in #1820
- Better graceful shutdown. by @Narsil in #1827
- Add the missing
tool_prompt
parameter to Python client by @maziyarpanahi in #1825 - Small CI cleanup. by @Narsil in #1801
- Add reference to TPU support by @brandonroyal in #1760
- fix: use get_speculate to the number of layers by @OlivierDehaene in #1737
- feat: add how it works section by @drbh in #1773
- Fixing frequency penalty by @martinigoyanes in #1811
- feat: add vlm docs and simple examples by @drbh in #1812
- Handle images in chat api by @drbh in #1828
- chore: update torch by @OlivierDehaene in #1730
- (chore): torch 2.3.0 by @Narsil in #1833
New Contributors
- @MoritzLaurer made their first contribution in #1790
- @dr3s made their first contribution in #1775
- @maziyarpanahi made their first contribution in #1825
- @brandonroyal made their first contribution in #1760
- @martinigoyanes made their first contribution in #1811
Full Changelog: v2.0.1...v2.0.2