v1.4.0
Highlights
- OpenAI compatible API #1427
- exllama v2 Tensor Parallel #1490
- GPTQ support for AMD GPUs #1489
- Phi support #1442
What's Changed
- fix: fix local loading for .bin models by @OlivierDehaene in #1419
- Fix missing make target platform for local install: 'install-flash-attention-v2' by @deepily in #1414
- fix: follow base model for tokenizer in router by @OlivierDehaene in #1424
- Fix local load for Medusa by @PYNing in #1420
- Return prompt vs generated tokens. by @Narsil in #1436
- feat: supports openai chat completions API by @drbh in #1427
- feat: support raise_exception, bos and eos tokens by @drbh in #1450
- chore: bump rust version and annotate/fix all clippy warnings by @drbh in #1455
- feat: conditionally toggle chat on invocations route by @drbh in #1454
- Disable
decoder_input_details
on OpenAI-compatible chat streaming, pass temp and top-k from API by @EndlessReform in #1470 - Fixing non divisible embeddings. by @Narsil in #1476
- Add messages api compatibility docs by @drbh in #1478
- Add a new
/tokenize
route to get the tokenized input by @Narsil in #1471 - feat: adds phi model by @drbh in #1442
- fix: read stderr in download by @OlivierDehaene in #1486
- fix: show warning with tokenizer config parsing error by @drbh in #1488
- fix: launcher doc typos by @Narsil in #1473
- Reinstate exl2 with tp by @Narsil in #1490
- Add sealion mpt support by @Narsil in #1477
- Trying to fix that flaky test. by @Narsil in #1491
- fix: launcher doc typos by @thelinuxkid in #1462
- Update the docs to include newer models. by @Narsil in #1492
- GPTQ support on ROCm by @fxmarty in #1489
- feat: add tokenizer-config-path to launcher args by @drbh in #1495
New Contributors
- @deepily made their first contribution in #1414
- @PYNing made their first contribution in #1420
- @drbh made their first contribution in #1427
- @EndlessReform made their first contribution in #1470
- @thelinuxkid made their first contribution in #1462
Full Changelog: v1.3.4...v1.4.0