【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug #4318

chang-wenbin · 2025-09-29T07:35:33Z

Support cache initialization of MLA backend to rationalize the allocation of kvcache video memory, blocknum from 1500->4500, concurrency from 45->145.
Fixed a bug in v1-schedule that caused the number of activated tokens to exceed max-num-batched-tokens.

paddle-bot · 2025-09-29T07:35:39Z

Thanks for your contribution!

qingqing01 · 2025-09-29T07:52:46Z

fastdeploy/worker/gpu_model_runner.py

+        # To rationalize the allocation of kvcache.
+        from fastdeploy import envs
+
+        self.mla_cache = envs.FD_ATTENTION_BACKEND == "MLA_ATTN"


这里是用 MLA 的模型自动设置此环境变量，还是需要手动设置？

目前是启动脚本手动设置 export FD_ATTENTION_BACKEND="MLA_ATTN"，
后面会根据config.json中的model_type 自动设置backend,这项修改计划和mla默认开启tensor_core一起提交。

gongshaotian

KVCache 创建后续需要放到 Attention Backend 里处理

Support MLA_CACHE & Fix V1_Schedule Bug

a145665

merge develop

51f924f

zhoutianzi666 approved these changes Sep 29, 2025

View reviewed changes

qingqing01 reviewed Sep 29, 2025

View reviewed changes

YuanRisheng added the skip-ci: coverage label Oct 9, 2025

gongshaotian approved these changes Oct 9, 2025

View reviewed changes

chang-wenbin merged commit 48fd5d7 into PaddlePaddle:develop Oct 9, 2025
34 of 41 checks passed

chang-wenbin changed the title ~~Support MLA_CACHE & Fix V1_Schedule Bug~~ 【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug #4318

【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug #4318

Uh oh!

chang-wenbin commented Sep 29, 2025

Uh oh!

paddle-bot bot commented Sep 29, 2025

Uh oh!

qingqing01 Sep 29, 2025

Uh oh!

chang-wenbin Sep 29, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug #4318

【Inference Optimize】Support MLA_CACHE & Fix V1_Schedule Bug #4318

Uh oh!

Conversation

chang-wenbin commented Sep 29, 2025

Uh oh!

paddle-bot bot commented Sep 29, 2025

Uh oh!

qingqing01 Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

chang-wenbin Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants