Release Release v0.1.18 · sgl-project/sglang

Highlight

2x large batch prefill improvement with the new flashinfer kernels #579
Multi-node tensor parallelism #550
New model support: ChatGLM #516

What's Changed

Fix missing numpy dependency in pyproject.toml by @fpreiss in #524
Fix RAG nb, parea setup (parea -> parea-ai) by @fpreiss in #525
[Minor] Correct Optional type hints in api by @fpreiss in #526
Add ChatGLM Model Support by @Qubitium in #516
Fix Regression: Disable p2p for 4090 by @ZX-ModelCloud in #531
Decode Incrementally by @hnyls2002 in #517
Fix dependency by @merrymercy in #538
Fix dependency & crash issues by @Ying1123 in #539
Higher priority for user input of max_prefill_tokens & format by @Ying1123 in #540
Add disk cache for loading ShareGPT dataset. by @hnyls2002 in #542
Fix tp worker only checking req[0] for stream by @Qubitium in #546
Fix the Jump-Forward with Chinese by @hnyls2002 in #551
Update fused_moe by @merrymercy in #553
Multi-node Tensor Parallelism by @Ying1123 in #550
Update flashinfer to 0.0.5 by @merrymercy in #554
Follow-up fixes for flashinfer 0.0.5 by @merrymercy in #556
Fix latency benchmark by @hnyls2002 in #557
Clean up logits processor by @merrymercy in #558
Update test_flashinfer by @hnyls2002 in #560
Allow running with vllm==0.4.3 by @merrymercy in #561
Add a new arguments log_level_http to control the HTTP logging by @merrymercy in #563
Add sglang.bench_latency for offline benchmark by @merrymercy in #564
Warmup cublas by @merrymercy in #566
Increase the number of thread limitation for tp worker managers. by @merrymercy in #567
Update readme by @merrymercy in #568
Expose dtype argument by @merrymercy in #569
Update benchmark script by @Ying1123 in #571
Minor fix in compiler & format by @ZackZeng999 in #545
Update run_batch interface and max_prefill_tokens by @Ying1123 in #574
Fix flashinfer version by @PanJason in #576
[BugFix] gemma loading weights "lm_head.weight" key error by @dhgarcia in #577
Turn on flashinfer by default by @Ying1123 in #578
fix the broken server args by @hnyls2002 in #585
2x performance improvement for large prefill & Fix workspace conflicts by @Ying1123 in #579

New Contributors

@fpreiss made their first contribution in #524
@ZackZeng999 made their first contribution in #545
@PanJason made their first contribution in #576
@dhgarcia made their first contribution in #577

Full Changelog: v0.1.17...v0.1.18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.1.18

Highlight

What's Changed

New Contributors

Contributors