Releases · InternLM/lmdeploy

09 Dec 12:08

lvhan028

v0.6.4

14b64c7

LMDeploy Release v0.6.4 Latest

Latest

What's Changed

🚀 Features

feature: support qwen2.5 fuction_call by @akai-shuuichi in #2737
[Feature] support minicpm-v_2_6 for pytorch engine. by @Reinerzhou in #2767
Support qwen2-vl AWQ quantization by @AllentDan in #2787
Add DeepSeek-V2 support by @lzhangzz in #2763
[ascend]feat: support kv int8 by @yao-fengchen in #2736

💥 Improvements

Optimize update_step_ctx on Ascend by @jinminxi104 in #2804
Add Ascend installation adapter by @zhabuye in #2817
Refactor turbomind (2/N) by @lzhangzz in #2818
add openssh-server installation in dockerfile by @lvhan028 in #2830
Add version restrictions in runtime_ascend.txt to ensure functionality by @zhabuye in #2836
better kv allocate by @grimoire in #2814
Update internvl chat template by @AllentDan in #2832
profile throughput without new threads by @grimoire in #2826
[dlinfer] change dlinfer kv_cache layout and ajust paged_prefill_attention api. by @Reinerzhou in #2847
[maca] add env to support different mm layout on maca. by @Reinerzhou in #2835
Supports W8A8 quantization for more models by @AllentDan in #2850

🐞 Bug fixes

disable prefix-caching for vl model by @grimoire in #2825
Fix gemma2 accuracy through the correct softcapping logic by @AllentDan in #2842
fix accessing before initialization by @lvhan028 in #2845
fix the logic to verify whether AutoAWQ has been successfully installed by @grimoire in #2844
check whether backend_config is None or not before accessing its attr by @lvhan028 in #2848
[ascend] convert kv cache to nd format in ascend graph mode by @tangzhiyi11 in #2853

📚 Documentations

Update supported models & Ascend doc by @jinminxi104 in #2765
update supported models by @lvhan028 in #2849

🌐 Other

[CI] Split vl testcases into turbomind and pytorch backend by @zhulinJulia24 in #2751
[dlinfer] Fix qwenvl rope error for dlinfer backend by @JackWeiw in #2795
[CI] add more testcase for mllm models by @zhulinJulia24 in #2791
Update dlinfer-ascend version in runtime_ascend.txt by @jinminxi104 in #2865
bump version to v0.6.4 by @lvhan028 in #2864

New Contributors

@akai-shuuichi made their first contribution in #2737
@JackWeiw made their first contribution in #2795
@zhabuye made their first contribution in #2817

Full Changelog: v0.6.3...v0.6.4

Contributors

grimoire, lvhan028, and 10 other contributors

Assets 12

16 Nov 04:31

lvhan028

v0.6.3

0c80baa

LMDeploy Release V0.6.3

What's Changed

🚀 Features

support yarn in turbomind backend by @irexyc in #2519
add linear op on dlinfer platform by @yao-fengchen in #2627
support turbomind head_dim 64 by @irexyc in #2715
[Feature]: support LlavaForConditionalGeneration with turbomind inference by @deepindeed2022 in #2710
Support Mono-InternVL with PyTorch backend by @wzk1015 in #2727
Support Qwen2-MoE models by @lzhangzz in #2723
Support mixtral moe AWQ quantization. by @AllentDan in #2725
Support chemvlm by @RunningLeon in #2738
Support molmo in turbomind by @lvhan028 in #2716

💥 Improvements

Call cuda empty_cache to prevent OOM when quantizing model by @AllentDan in #2671
feat: support dynamic/llama3 rotary embedding in ascend graph mode by @tangzhiyi11 in #2670
Add ensure_ascii = False for json.dumps by @AllentDan in #2707
Flatten cache and add flashattention by @grimoire in #2676
Support ep, column major moe kernel. by @grimoire in #2690
Remove one of the duplicate bos tokens by @AllentDan in #2708
Check server input by @irexyc in #2719
optimize dlinfer moe by @tangzhiyi11 in #2741

🐞 Bug fixes

Support min_tokens, min_p parameters for api_server by @AllentDan in #2681
fix index error when computing ppl on long-text prompt by @lvhan028 in #2697
Better tp exit log. by @grimoire in #2677
miss to read moe_ffn weights from converted tm model by @lvhan028 in #2698
Fix turbomind TP by @lzhangzz in #2706
fix decoding kernel for deepseekv2 by @grimoire in #2688
fix tp exit code for pytorch engine by @RunningLeon in #2718
fix assert pad >= 0 failed when inter_size is not a multiple of group… by @Vinkle-hzt in #2740
fix issue that mono-internvl failed to fallback pytorch engine by @lvhan028 in #2744
Remove use_fast=True when loading tokenizer for lite auto_awq by @AllentDan in #2758
set wrong head_dim for mistral-nemo by @lvhan028 in #2761

📚 Documentations

Update ascend readme by @jinminxi104 in #2756
fix ascend get_started.md link by @CyCle1024 in #2696
Fix llama3.2 VL vision in "Supported Modals" documents by @blankanswer in #2703

🌐 Other

[ci] support v100 dailytest by @zhulinJulia24 in #2665
[ci] add more testcase into evaluation and daily test by @zhulinJulia24 in #2721
feat: support multi cards in ascend graph mode by @tangzhiyi11 in #2755
bump version to v0.6.3 by @lvhan028 in #2754

New Contributors

@blankanswer made their first contribution in #2703
@tangzhiyi11 made their first contribution in #2670
@wzk1015 made their first contribution in #2727
@Vinkle-hzt made their first contribution in #2740

Full Changelog: v0.6.2...v0.6.3

Contributors

grimoire, lvhan028, and 13 other contributors

Assets 12

07 Nov 07:41

lvhan028

v0.6.2.post1

4fc9479

LMDeploy Release v0.6.2.post1

What's Changed

Bugs

Fix llama3.2 VL vision in "Supported Modals" documents @blankanswer in #2703
miss to read moe_ffn weights from converted tm model @lvhan028 in #2698
better tp exit log @grimoire in #2677
fix index error when computing ppl on long-text prompt @lvhan028 in #2697
Support min_tokens, min_p parameters for api_server @AllentDan in 2681
fix ascend get_started.md link @CyCle1024 in #2696
Call cuda empty_cache to prevent OOM when quantizing model @AllentDan in #2671
Fix turbomind TP for v0.6.2 by @lzhangzz in #2713

🌐 Other

[ci] support v100 dailytest (https://github.com/InternLM/lmdeploy/pull/2665[)](https://github.com/InternLM/lmdeploy/commit/434195ea0c80b38dc2cf80c79d53a30f22b53aab)
bump version to 0.6.2.post1 by @lvhan028 in #2717

Full Changelog: v0.6.2...v0.6.2.post1

Contributors

grimoire, lvhan028, and 4 other contributors

Assets 12

29 Oct 06:42

lvhan028

v0.6.2

522108c

LMDeploy Release v0.6.2

Highlights

PyTorch engine supports graph mode on ascend platform, doubling the inference speed
Support llama3.2-vision models in PyTorch engine
Support Mixtral in TurboMind engine, achieving 20+ RPS using SharedGPT dataset with 2 A100-80G GPUs

What's Changed

🚀 Features

support downloading models from openmind_hub by @cookieyyds in #2563
Support pytorch engine kv int4/int8 quantization by @AllentDan in #2438
feat(ascend): support w4a16 by @yao-fengchen in #2587
[maca] add maca backend support. by @Reinerzhou in #2636
Support mllama for pytorch engine by @AllentDan in #2605
add --eager-mode to cli by @RunningLeon in #2645
[ascend] add ascend graph mode by @CyCle1024 in #2647
MoE support for turbomind by @lzhangzz in #2621

💥 Improvements

[Feature] Add argument to disable FastAPI docs by @mouweng in #2540
add check for device with cap 7.x by @grimoire in #2535
Add tool role for langchain usage by @AllentDan in #2558
Fix llama3.2-1b inference error by handling tie_word_embedding by @grimoire in #2568
Add a workaround for saving internvl2 with latest transformers by @AllentDan in #2583
optimize paged attention on triton3 by @grimoire in #2553
refactor for multi backends in dlinfer by @CyCle1024 in #2619
Copy sglang/bench_serving.py to lmdeploy as serving benchmark script by @lvhan028 in #2620
Add barrier to prevent TP nccl kernel waiting. by @grimoire in #2607
[ascend] refactor fused_moe on ascend platform by @yao-fengchen in #2613
[ascend] support paged_prefill_attn when batch > 1 by @yao-fengchen in #2612
Raise an error for the wrong chat template by @AllentDan in #2618
refine pre-post-process by @jinminxi104 in #2632
small block_m for sm7.x by @grimoire in #2626
update check for triton by @grimoire in #2641
Support llama3.2 LLM models in turbomind engine by @lvhan028 in #2596
Check whether device support bfloat16 by @lvhan028 in #2653
Add warning message about do_sample to alert BC by @lvhan028 in #2654
update ascend dockerfile by @CyCle1024 in #2661
fix supported model list in ascend graph mode by @jinminxi104 in #2669
remove dlinfer version by @CyCle1024 in #2672

🐞 Bug fixes

set outlines<0.1.0 by @AllentDan in #2559
fix: make exit_flag verification for ascend more general by @CyCle1024 in #2588
set capture mode thread_local by @grimoire in #2560
Add distributed context in pytorch engine to support torchrun by @grimoire in #2615
Fix error in python3.8. by @Reinerzhou in #2646
Align UT with triton fill_kv_cache_quant kernel by @AllentDan in #2644
miss device_type when checking is_bf16_supported on ascend platform by @lvhan028 in #2663
fix syntax in Dockerfile_aarch64_ascend by @CyCle1024 in #2664
Set history_cross_kv_seqlens to 0 by default by @AllentDan in #2666
fix build error in ascend dockerfile by @CyCle1024 in #2667
bugfix: llava-hf/llava-interleave-qwen-7b-hf (#2497) by @deepindeed2022 in #2657
fix inference mode error for qwen2-vl by @irexyc in #2668

📚 Documentations

Add instruction for downloading models from openmind hub by @cookieyyds in #2577
Fix spacing in ascend user guide by @Superskyyy in #2601
Update get_started tutorial about deploying on ascend platform by @jinminxi104 in #2655
Update ascend get_started tutorial about installing nnal by @jinminxi104 in #2662

🌐 Other

[ci] add oc infer test in stable test by @zhulinJulia24 in #2523
update copyright by @lvhan028 in #2579
[Doc]: Lock sphinx version by @RunningLeon in #2594
[ci] use local requirements for test workflow by @zhulinJulia24 in #2569
[ci] add pytorch kvint testcase into function regresstion by @zhulinJulia24 in #2584
[ci] React dailytest workflow by @zhulinJulia24 in #2617
[ci] fix restful script by @zhulinJulia24 in #2635
[ci] add internlm2_5_7b_batch_1 into evaluation testcase by @zhulinJulia24 in #2631
match torch and torch_vision version by @grimoire in #2649
Bump version to v0.6.2 by @lvhan028 in #2659

New Contributors

@mouweng made their first contribution in #2540
@cookieyyds made their first contribution in #2563
@Superskyyy made their first contribution in #2601
@Reinerzhou made their first contribution in #2636
@deepindeed2022 made their first contribution in #2657

Full Changelog: v0.6.1...v0.6.2

Contributors

grimoire, lvhan028, and 13 other contributors

Assets 12

28 Sep 11:34

lvhan028

v0.6.1

2e49fc3

LMDeploy Release V0.6.1

What's Changed

🚀 Features

Support user-sepcified data type by @lvhan028 in #2473
Support minicpm3-4b by @AllentDan in #2465
support Qwen2-VL with pytorch backend by @irexyc in #2449

💥 Improvements

Add silu mul kernel by @grimoire in #2469
adjust schedule to improve TTFT in pytorch engine by @grimoire in #2477
Add max_log_len option to control length of printed log by @lvhan028 in #2478
set served model name being repo_id from hub before it is downloaded by @lvhan028 in #2494
Improve proxy server usage by @AllentDan in #2488
CudaGraph mixin by @grimoire in #2485
pytorch engine add get_logits by @grimoire in #2487
Refactor lora by @grimoire in #2466
support noaligned silu_and_mul by @grimoire in #2506
optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way by @jiajie-yang in #2521
Fix chatglm tokenizer failed when transformers>=4.45.0 by @AllentDan in #2520

🐞 Bug fixes

Fix "TypeError: Got unsupported ScalarType BFloat16" by @SeitaroShinagawa in #2472
fix ascend atten_mask by @yao-fengchen in #2483
Catch exceptions thrown by turbomind inference thread by @lvhan028 in #2502
The get_ppl missed the last token of each iteration during multi-iter prefill by @lvhan028 in #2499
fix vl gradio by @irexyc in #2527

🌐 Other

[ci] regular update by @zhulinJulia24 in #2431
[CI] add base model evaluation by @zhulinJulia24 in #2490
bump version to v0.6.1 by @lvhan028 in #2513

New Contributors

@SeitaroShinagawa made their first contribution in #2472

Full Changelog: v0.6.0...v0.6.1

Contributors

grimoire, lvhan028, and 6 other contributors

Assets 12

13 Sep 03:12

lvhan028

v0.6.0

e2aa4bd

LMDeploy Release v0.6.0

Highlight

Optimize W4A16 quantized model inference by implementing GEMM in TurboMind Engine
- Add GPTQ-INT4 inference
- Support CUDA architecture from SM70 and above, equivalent to the V100 and above.
Refactor PytorchEngine
- Employ CUDA graph to boost the inference performance (30%)
- Support more models in Huawei Ascend platform
Upgrade GenerationConfig
- Support min_p sampling
- Add do_sample=False as the default option
- Remove EngineGenerationConfig and merge it to GenertionConfig
Support guided decoding
Distinguish between the concepts of the name of the deployed model and the name of the model's chat tempate
Before:

lmdeploy serve api_server /the/path/of/your/awesome/model \
    --model-name customized_chat_template.json

After

lmdeploy serve api_server  /the/path/of/your/awesome/model \
    --model-name "the served model name"
    --chat-template customized_chat_template.json

Break Changes

TurboMind model converter. Please re-convert the models if you uses this feature
EngineGenerationConfig is removed. Please use GenerationConfig instead
Chat template. Please use --chat-template to specify it

What's Changed

🚀 Features

support vlm custom image process parameters in openai input format by @irexyc in #2245
New GEMM kernels for weight-only quantization by @lzhangzz in #2090
Fix hidden size and support mistral nemo by @AllentDan in #2215
Support custom logits processors by @AllentDan in #2329
support openbmb/MiniCPM-V-2_6 by @irexyc in #2351
Support phi3.5 for pytorch engine by @RunningLeon in #2361
Add auto_gptq to lmdeploy lite by @AllentDan in #2372
build(ascend): add Dockerfile for ascend aarch64 910B by @CyCle1024 in #2278
Support guided decoding for pytorch backend by @AllentDan in #1856
support min_p sampling parameter by @irexyc in #2420
Refactor pytorch engine by @grimoire in #2104
refactor pytorch engine(ascend) by @yao-fengchen in #2440

💥 Improvements

Remove deprecated arguments from API and clarify model_name and chat_template_name by @lvhan028 in #1931
Fix duplicated session_id when pipeline is used by multithreads by @irexyc in #2134
remove eviction param by @grimoire in #2285
Remove QoS serving by @AllentDan in #2294
Support send tool_calls back to internlm2 by @AllentDan in #2147
Add stream options to control usage by @AllentDan in #2313
add device type for pytorch engine in cli by @RunningLeon in #2321
Update error status_code to raise error in openai client by @AllentDan in #2333
Change to use device instead of device-type in cli by @RunningLeon in #2337
Add GEMM test utils by @lzhangzz in #2342
Add environment variable to control SILU fusion by @lzhangzz in #2343
Use single thread per model instance by @lzhangzz in #2339
add cache to speed up docker building by @RunningLeon in #2344
add max_prefill_token_num argument in CLI by @lvhan028 in #2345
torch engine optimize prefill for long context by @grimoire in #1962
Refactor turbomind (1/N) by @lzhangzz in #2352
feat(server): enable seed parameter for openai compatible server. by @DearPlanet in #2353
support do_sample parameter by @irexyc in #2375
refactor TurbomindModelConfig by @lvhan028 in #2364
import dlinfer before imageencoding by @jinminxi104 in #2413
ignore *.pth when download model from model hub by @lvhan028 in #2426
inplace logits process as default by @grimoire in #2427
handle invalid images by @irexyc in #2312
Split token_embs and lm_head weights by @irexyc in #2252
build: update ascend dockerfile by @CyCle1024 in #2421
build nccl in dockerfile for cuda11.8 by @RunningLeon in #2433
automatically set max_batch_size according to the device when it is not specified by @lvhan028 in #2434
rename the ascend dockerfile by @lvhan028 in #2403
refactor ascend kernels by @yao-fengchen in #2355

🐞 Bug fixes

enable run vlm with pytorch engine in gradio by @RunningLeon in #2256
fix side-effect: failed to update tm model config with tm engine config by @lvhan028 in #2275
Fix internvl2 template and update docs by @irexyc in #2292
fix the issue missing dependencies in the Dockerfile and pip by @ColorfulDick in #2240
Fix the way to get "quantization_config" from model's coniguration by @lvhan028 in #2325
fix(ascend): fix import error of pt engine in cli by @CyCle1024 in #2328
Default rope_scaling_factor of TurbomindEngineConfig to None by @lvhan028 in #2358
Fix the logic of update engine_config to TurbomindModelConfig for both tm model and hf model by @lvhan028 in #2362
fix cache position for pytorch engine by @RunningLeon in #2388
Fix /v1/completions batch order wrong by @AllentDan in #2395
Fix some issues encountered by modelscope and community by @irexyc in #2428
fix llama3 rotary in pytorch engine by @grimoire in #2444
fix tensors on different devices when deploying MiniCPM-V-2_6 with tensor parallelism by @irexyc in #2454
fix MultinomialSampling operator builder by @grimoire in #2460
Fix initialization of runtime_min_p by @irexyc in #2461
fix Windows compile error by @zhyncs in #2303
fix: follow up #2303 by @zhyncs in #2307

📚 Documentations

Reorganize the user guide and update the get_started section by @lvhan028 in #2038
cancel support baichuan2 7b awq in pytorch engine by @grimoire in #2246
Add user guide about slora serving by @AllentDan in #2084
Reorganize the table of content of get_started by @lvhan028 in #2378
fix get_started user guide unaccessible by @lvhan028 in #2410
add Ascend get_started by @jinminxi104 in #2417

🌐 Other

test prtest image update by @zhulinJulia24 in #2192
Update python support version by @wuhongsheng in #2290
[ci] benchmark react by @zhulinJulia24 in #2183
bump version to v0.6.0a0 by @lvhan028 in #2371
[ci] add daily test's coverage report by @zhulinJulia24 in #2401
update actions/download-artifact to v4 to fix security issue by @lvhan028 in #2419
bump version to v0.6.0 by @lvhan028 in #2445

New Contributors

@wuhongsheng made their first contribution in #2290
@ColorfulDick made their first contribution in #2240
@DearPlanet made their first contribution in #2353
@jinminxi104 made their first contribution in #2413

Full Changelog: v0.5.3...v0.6.0

Contributors

grimoire, lvhan028, and 12 other contributors

Assets 12

26 Aug 09:12

lvhan028

v0.6.0a0

97b880b

LMDeploy Release V0.6.0a0

Highlight

Optimize W4A16 quantized model inference by implementing GEMM in TurboMind Engine
- Add GPTQ-INT4 inference
- Support CUDA architecture from SM70 and above, equivalent to the V100 and above.
Optimize the prefilling inference stage of PyTorchEngine
Distinguish between the concepts of the name of the deployed model and the name of the model's chat tempate

Before:

lmdeploy serve api_server /the/path/of/your/awesome/model \
    --model-name customized_chat_template.json

After

lmdeploy serve api_server  /the/path/of/your/awesome/model \
    --model-name "the served model name"
    --chat-template customized_chat_template.json

What's Changed

🚀 Features

support vlm custom image process parameters in openai input format by @irexyc in #2245
New GEMM kernels for weight-only quantization by @lzhangzz in #2090
Fix hidden size and support mistral nemo by @AllentDan in #2215
Support custom logits processors by @AllentDan in #2329
support openbmb/MiniCPM-V-2_6 by @irexyc in #2351
Support phi3.5 for pytorch engine by @RunningLeon in #2361

💥 Improvements

Remove deprecated arguments from API and clarify model_name and chat_template_name by @lvhan028 in #1931
Fix duplicated session_id when pipeline is used by multithreads by @irexyc in #2134
remove eviction param by @grimoire in #2285
Remove QoS serving by @AllentDan in #2294
Support send tool_calls back to internlm2 by @AllentDan in #2147
Add stream options to control usage by @AllentDan in #2313
add device type for pytorch engine in cli by @RunningLeon in #2321
Update error status_code to raise error in openai client by @AllentDan in #2333
Change to use device instead of device-type in cli by @RunningLeon in #2337
Add GEMM test utils by @lzhangzz in #2342
Add environment variable to control SILU fusion by @lzhangzz in #2343
Use single thread per model instance by @lzhangzz in #2339
add cache to speed up docker building by @RunningLeon in #2344
add max_prefill_token_num argument in CLI by @lvhan028 in #2345
torch engine optimize prefill for long context by @grimoire in #1962
Refactor turbomind (1/N) by @lzhangzz in #2352
feat(server): enable seed parameter for openai compatible server. by @DearPlanet in #2353

🐞 Bug fixes

enable run vlm with pytorch engine in gradio by @RunningLeon in #2256
fix side-effect: failed to update tm model config with tm engine config by @lvhan028 in #2275
Fix internvl2 template and update docs by @irexyc in #2292
fix the issue missing dependencies in the Dockerfile and pip by @ColorfulDick in #2240
Fix the way to get "quantization_config" from model's coniguration by @lvhan028 in #2325
fix(ascend): fix import error of pt engine in cli by @CyCle1024 in #2328
Default rope_scaling_factor of TurbomindEngineConfig to None by @lvhan028 in #2358
Fix the logic of update engine_config to TurbomindModelConfig for both tm model and hf model by @lvhan028 in #2362

📚 Documentations

Reorganize the user guide and update the get_started section by @lvhan028 in #2038
cancel support baichuan2 7b awq in pytorch engine by @grimoire in #2246
Add user guide about slora serving by @AllentDan in #2084

🌐 Other

test prtest image update by @zhulinJulia24 in #2192
Update python support version by @wuhongsheng in #2290
fix Windows compile error by @zhyncs in #2303
fix: follow up #2303 by @zhyncs in #2307
[ci] benchmark react by @zhulinJulia24 in #2183
bump version to v0.6.0a0 by @lvhan028 in #2371

New Contributors

@wuhongsheng made their first contribution in #2290
@ColorfulDick made their first contribution in #2240
@DearPlanet made their first contribution in #2353

Full Changelog: v0.5.3...v0.6.0a0

Contributors

grimoire, lvhan028, and 10 other contributors

Assets 12

07 Aug 03:38

lvhan028

v0.5.3

a129a14

LMDeploy Release V0.5.3

What's Changed

🚀 Features

PyTorch Engine AWQ support by @grimoire in #1913
Phi3 awq by @grimoire in #1984
Fix chunked prefill by @lzhangzz in #2201
support VLMs with Qwen as the language model by @irexyc in #2207

💥 Improvements

Support specifying a prefix of assistant response by @AllentDan in #2172
Strict check for name_map in InternLM2Chat7B by @SamuraiBUPT in #2156
Check errors for attention kernels by @lzhangzz in #2206
update base image to support cuda12.4 in dockerfile by @RunningLeon in #2182
Stop synchronizing for length_criterion by @lzhangzz in #2202
adapt MiniCPM-Llama3-V-2_5 new code by @irexyc in #2139
Remove duplicate code by @cmpute in #2133

🐞 Bug fixes

[Hotfix] miss parentheses when calcuating the coef of llama3 rope by @lvhan028 in #2157
support logit softcap by @grimoire in #2158
Fix gmem to smem WAW conflict in awq gemm kernel by @foreverrookie in #2111
Fix gradio serve using a wrong chat template by @AllentDan in #2131
fix runtime error when using dynamic scale rotary embed for InternLM2… by @CyCle1024 in #2212
Add peer-access-enabled allocator by @lzhangzz in #2218
Fix typos in profile_generation.py by @jiajie-yang in #2233

📚 Documentations

docs: fix Qwen typo by @ArtificialZeng in #2136
wrong expression by @ArtificialZeng in #2165
clearify the model type LLM or MLLM in supported model matrix by @lvhan028 in #2209
docs: add Japanese README by @eltociear in #2237

🌐 Other

bump version to 0.5.2.post1 by @lvhan028 in #2159
update news about cooperation with modelscope/swift by @lvhan028 in #2200
bump version to v0.5.3 by @lvhan028 in #2242

New Contributors

@ArtificialZeng made their first contribution in #2136
@foreverrookie made their first contribution in #2111
@SamuraiBUPT made their first contribution in #2156
@CyCle1024 made their first contribution in #2212
@jiajie-yang made their first contribution in #2233
@cmpute made their first contribution in #2133

Full Changelog: v0.5.2...v0.5.3

Contributors

grimoire, lvhan028, and 11 other contributors

Assets 12

26 Jul 12:22

lvhan028

v0.5.2.post1

fb6f8ea

LMDeploy Release V0.5.2.post1

What's Changed

🐞 Bug fixes

[Hotfix] miss parentheses when calcuating the coef of llama3 rope which causes needle-in-hays experiment failed by @lvhan028 in #2157

🌐 Other

bump version to 0.5.2.post1 by @lvhan028 in #2159

Full Changelog: v0.5.2...v0.5.2.post1

Contributors

lvhan028

Assets 12

26 Jul 08:07

lvhan028

v0.5.2

7199b4e

LMDeploy Release V0.5.2

Highlight

LMDeploy support Llama3.1 and its Tool Calling. An example of calling "Wolfram Alpha" to perform complex mathematical calculations can be found from here

What's Changed

🚀 Features

Support glm4 awq by @AllentDan in #1993
Support llama3.1 by @lvhan028 in #2122
Support Llama3.1 tool calling by @AllentDan in #2123

💥 Improvements

Remove the triton inference server backend "turbomind_backend" by @lvhan028 in #1986
Remove kv cache offline quantization by @AllentDan in #2097
Remove session_len and deprecated short names of the chat templates by @lvhan028 in #2105
clarify "n>1" in GenerationConfig hasn't been supported yet by @lvhan028 in #2108

🐞 Bug fixes

fix stop words for glm4 by @RunningLeon in #2044
Disable peer access code by @lzhangzz in #2082
set log level ERROR in benchmark scripts by @lvhan028 in #2086
raise thread exception by @irexyc in #2071
Fix index error when profiling token generation with -ct 1 by @lvhan028 in #1898

🌐 Other

misc: replace slow Jimver/cuda-toolkit by @zhyncs in #2065
misc: update bug issue template by @zhyncs in #2083
update daily testcase new by @zhulinJulia24 in #2035
bump version to v0.5.2 by @lvhan028 in #2143

Full Changelog: v0.5.1...v0.5.2

Contributors

lvhan028, irexyc, and 5 other contributors

Assets 12

Releases: InternLM/lmdeploy

LMDeploy Release v0.6.4

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.6.3

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release v0.6.2.post1

What's Changed

Bugs

🌐 Other

Contributors

LMDeploy Release v0.6.2

Highlights

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.6.1

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

LMDeploy Release v0.6.0

Highlight

Break Changes

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.6.0a0

Highlight

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.5.3

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.5.2.post1

What's Changed

🐞 Bug fixes

🌐 Other

Contributors

LMDeploy Release V0.5.2

Highlight

What's Changed