Release v0.1.16 · sgl-project/sglang

Highlight

Support more models: DBRX, Command-R, Gemma
Support llava-video (#423, https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)
Cache performance improvements (#418, #364)
Marlin quantization kernels
Many bug fixes
Update dependencies to be compatible with their latest versions

What's Changed

Fix Runtime missing some ServerArgs options by @Qubitium in #281
adding the triton docker build minimal example by @amirarsalan90 in #242
Fix flashinfer >= 0.0.3 compat by @Qubitium in #282
Fix Incorrect CURL Request Example in README by @amirarsalan90 in #287
enable marlin kernels by @qeternity in #286
Fix env (docker) compat due to file usage by @Qubitium in #288
Fix marlin model loading compat with autogptq by @Liurl21 in #290
Fix outlines-0.0.35 incompatibility by @ZhouGongZaiShi in #291
[Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models by @Luodian in #311
Use Anthropic messages API by @janimo in #304
Add StableLM model. by @janimo in #301
Support oai in benchmark/mmlu by @merrymercy in #323
Update version to v0.1.14 by @merrymercy in #324
Cleanup codebase: removed unnecessary code/logic by @Qubitium in #298
Update dependencies by @janimo in #326
Openrouter usage example by @janimo in #327
model_rpc style improvement by @hnyls2002 in #293
model_runner simplify by @hnyls2002 in #329
Logprobs Refractor by @hnyls2002 in #331
DBRX support by @hnyls2002 in #337
Add support for new autogptq quant_config.checkpoint_format by @Qubitium in #332
Fix llava parallelism/fork bug by @lockon-n in #315
Eliminate 2 gpu ops during sampling when logit_bias is zero by @hnyls2002 in #343
Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" by @hnyls2002 in #345
Eliminate 2 gpu ops during sampling when logit_bias is zero by @Qubitium in #338
Add timeout to get_meta_info by @SimoneRaponi in #346
Fix typos in infer_batch.py by @tom-doerr in #354
Time cost utils by @hnyls2002 in #355
Update README.md by @eltociear in #358
support command-r by @ZhouXingg in #369
Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) by @fronx in #368
Update model support in readme by @Ying1123 in #370
Optimize radix tree matching by @ispobock in #364
Reduce overhead when fork(1) by @hnyls2002 in #375
llama3 instruct template by @qeternity in #372
add .isort.cfg by @hnyls2002 in #378
Revert removing the unused imports by @hnyls2002 in #385
Benchmark Updates by @hnyls2002 in #382
Improve performance when running with full parallel by @hnyls2002 in #394
Minor: style improvement of radix_cache and memory_pool by @hnyls2002 in #395
Format Benchmark Code by @hnyls2002 in #399
Fix chatml template by @merrymercy in #406
Adding RAG tracing & eval cookbook using Parea by @joschkabraun in #390
SamplingParams add "spaces_between_special_tokens" argument by @ZhouXingg in #392
Organize Benchmark by @hnyls2002 in #381
Add Cohere Command R chat template by @noah-kim-theori in #411
Fix sync() when fork(1) by @hnyls2002 in #412
Include finish reason in meta info response by @qeternity in #415
Make public APIs more standard. by @hnyls2002 in #416
Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 by @Qubitium in #380
Optimize the memory usage of logits processor by @merrymercy in #420
Clean up by @merrymercy in #422
Fix logit processor bugs by @merrymercy in #427
Minor fix for the import path by @merrymercy in #428
Move openai api server into a separate file by @merrymercy in #429
Fix flashinfer by @merrymercy in #430
Update version to 0.1.15 by @merrymercy in #431
Misc fixes by @merrymercy in #432
Allow input_ids in the input of the /generate endpoint by @lolipopshock in #363
Improve error handling by @merrymercy in #433
Cache optimizations by @hnyls2002 in #418
Update readme by @merrymercy in #434
Raise errors for prompts that are too long by @merrymercy in #436
support llava video by @ZhangYuanhan-AI in #426
Fix streaming by @merrymercy in #437
Update version to 0.1.16 by @merrymercy in #438

New Contributors

@Qubitium made their first contribution in #281
@amirarsalan90 made their first contribution in #242
@Liurl21 made their first contribution in #290
@ZhouGongZaiShi made their first contribution in #291
@Luodian made their first contribution in #311
@janimo made their first contribution in #304
@lockon-n made their first contribution in #315
@SimoneRaponi made their first contribution in #346
@tom-doerr made their first contribution in #354
@ZhouXingg made their first contribution in #369
@fronx made their first contribution in #368
@ispobock made their first contribution in #364
@joschkabraun made their first contribution in #390
@noah-kim-theori made their first contribution in #411
@lolipopshock made their first contribution in #363
@ZhangYuanhan-AI made their first contribution in #426

Full Changelog: v0.1.13...v0.1.16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.16

Highlight

What's Changed

New Contributors

Contributors