Fix MLA cache shape init without layerid#8072
Conversation
Signed-off-by: Socratesa <lihaode@zju.edu.cn>
There was a problem hiding this comment.
Pull request overview
该 PR 修复了在 Cache Manager 为了计算 MLA KV cache shape 而“临时构造 backend”时可能缺失 layer_id,从而导致 get_kv_cache_shape() 访问 self.layer_id 报错的问题,使该函数在缺少 layer_id 的场景下能够继续工作。
Changes:
- 在
get_kv_cache_shape()中将self.layer_id改为通过getattr(self, "layer_id", None)获取,避免属性不存在时抛异常。 - 仅在
layer_id存在时才使用window_attn_skip_freq[layer_id]的分支判断。
PR 标题/描述检查(需要补充)
- 标题缺少规定的 tag,建议改为:
[BugFix] Fix MLA cache shape init without layerid - 描述中的 Modifications / Usage or Command / Accuracy Tests 基本为空;若无测试/命令/精度结果,建议在对应小节说明原因与验证方式。
- 该改动属于 bugfix,通常不需要额外文档;但若该行为会影响使用方(例如 cache manager 的构造方式),建议确认是否需要在相关说明中补一句。
| if ( | ||
| layer_id is not None | ||
| and self.window_attn_skip_freq is not None | ||
| and self.window_attn_skip_freq[layer_id] == 1 | ||
| ): |
|
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-23 21:44:21
📋 Review 摘要
PR 概述:修复 MLA cache shape 计算在无 layer_id 临时 backend 上的异常。
变更范围:fastdeploy/model_executor/layers/attention/mla_attention_backend.py
影响面 Tag:[OP] [KVCache]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fastdeploy/model_executor/layers/attention/mla_attention_backend.py:626 |
无 layer_id 时默认普通 MLA shape,会导致 window_attn_skip_freq 层的 KV cache 维度被低估 |
📝 PR 规范检查
标题缺少官方 Tag,且 Modifications、Usage or Command、Accuracy Tests 仍为空/占位内容。可直接替换为:
标题建议(可直接复制):
[BugFix] Fix MLA cache shape init without layer_id
PR 描述建议(点击展开,可直接复制)
## Motivation
cache manager 临时构造 MLA attention backend 只用于计算 cache shape 时没有 `layer_id`,原逻辑直接访问 `self.layer_id` 会报错。
## Modifications
- 在 `MLAAttentionBackend.get_kv_cache_shape` 中通过 `getattr(self, "layer_id", None)` 读取 layer id。
- 仅在存在 `layer_id` 且当前层命中 `window_attn_skip_freq` 时使用 SWA/FP8 MLA cache shape。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
当前改动能避免 AttributeError,但会把启用 window_attn_skip_freq 的 MLA cache 初始化路径变成静默的 shape 低估,建议修复后再合入。
| value_cache_shape = [] | ||
| if self.window_attn_skip_freq is not None and self.window_attn_skip_freq[layer_id] == 1: | ||
| if ( | ||
| layer_id is not None |
There was a problem hiding this comment.
🔴 Bug layer_id is None 时直接跳过 window_attn_skip_freq 会把含 SWA/窗口层的 MLA cache shape 降成普通 MLA shape。
PrefixCacheManager._get_kv_cache_shape 会临时构造 backend 后立即调用 get_kv_cache_shape,这类实例没有 layer_id;V1 CacheController.initialize_kv_cache 也会在旧路径逐层设置 attn_backend.layer_id 之前调用 create_kv_cache。当前分支会返回 kv_lora_rank + qk_rope_head_dim 的普通 shape,但 runner 对 window_attn_skip_freq[i] == 1 的层使用 kv_lora_rank + 4 * (kv_lora_rank // 128) + 2 * qk_rope_head_dim 并按 uint8 cache attach/创建。这样 PR 虽然避免了 AttributeError,却会给窗口层分配或传递偏小的 IPC/cache shape,后续 attach 或 attention kernel 可能出现 shape mismatch 或越界读写。
建议修复方式:
让无 layer_id 的调用不要默认走普通 MLA shape。可以让 cache manager/cache controller 在调用前按层设置/传入 layer_id 并逐层取 shape;如果该接口必须返回单一 shape,则在 layer_id is None and any(window_attn_skip_freq) 时返回覆盖所有层的保守最大 shape,并同步 dtype/IPC 分配策略。
Motivation
cache manager 临时构造 backend 只为了算 cache shape,没有 layer_id,self.layerid 报错
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.