v2.0.0
TGI is back to Apache 2.0!
Highlights
- License was reverted to Apache 2.0
- Cuda graphs are now used by default. They improve latency substancially on high end nodes.
- Llava-next was added. It is the second multimodal model available on TGI after Idefics.
- Cohere Command R+ support. TGI is the fastest open source backend for Command R+
- FP8 support.
- We now share the vocabulary for all medusa heads, greatly improving latency and memory use.
Try out Command R+ with Medusa heads on 4xA100s with:
model=text-generation-inference/commandrplus-medusa
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model --speculate 3 --num-shard 4
What's Changed
- Add cuda graphs sizes and make it default. by @Narsil in #1703
- Pickle conversion now requires
--trust-remote-code
. by @Narsil in #1704 - Push users to streaming in the readme. by @Narsil in #1698
- Fixing cohere tokenizer. by @Narsil in #1697
- Force weights_only (before fully breaking pickle files anyway). by @Narsil in #1710
- Regenerate ld.so.cache by @oOraph in #1708
- Revert license to Apache 2.0 by @OlivierDehaene in #1714
- Automatic quantization config. by @Narsil in #1719
- Adding Llava-Next (Llava 1.6) with full support. by @Narsil in #1709
- fix: fix CohereForAI/c4ai-command-r-plus by @OlivierDehaene in #1707
- Update libraries by @abhishekkrthakur in #1713
- Dev/mask ldconfig output v2 by @oOraph in #1716
- Fp8 Support by @Narsil in #1726
- Upgrade EETQ (Fixes the cuda graphs). by @Narsil in #1729
- fix(router): fix a possible deadlock in next_batch by @OlivierDehaene in #1731
- chore(cargo-toml): apply lto fat and codegen-units of one by @somehowchris in #1651
- Improve the defaults for the launcher by @Narsil in #1727
- feat: medusa shared by @OlivierDehaene in #1734
- Fix typo in guidance.md by @eltociear in #1735
New Contributors
- @somehowchris made their first contribution in #1651
Full Changelog: v1.4.5...v2.0.0