Skip to content

Fix MLA cache shape init without layerid#8072

Open
Socratesa wants to merge 1 commit into
PaddlePaddle:developfrom
Socratesa:mla_fix
Open

Fix MLA cache shape init without layerid#8072
Socratesa wants to merge 1 commit into
PaddlePaddle:developfrom
Socratesa:mla_fix

Conversation

@Socratesa

Copy link
Copy Markdown

Motivation

cache manager 临时构造 backend 只为了算 cache shape,没有 layer_id,self.layerid 报错

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Signed-off-by: Socratesa <lihaode@zju.edu.cn>
Copilot AI review requested due to automatic review settings June 23, 2026 13:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 修复了在 Cache Manager 为了计算 MLA KV cache shape 而“临时构造 backend”时可能缺失 layer_id,从而导致 get_kv_cache_shape() 访问 self.layer_id 报错的问题,使该函数在缺少 layer_id 的场景下能够继续工作。

Changes:

  • get_kv_cache_shape() 中将 self.layer_id 改为通过 getattr(self, "layer_id", None) 获取,避免属性不存在时抛异常。
  • 仅在 layer_id 存在时才使用 window_attn_skip_freq[layer_id] 的分支判断。

PR 标题/描述检查(需要补充)

  • 标题缺少规定的 tag,建议改为:[BugFix] Fix MLA cache shape init without layerid
  • 描述中的 Modifications / Usage or Command / Accuracy Tests 基本为空;若无测试/命令/精度结果,建议在对应小节说明原因与验证方式。
  • 该改动属于 bugfix,通常不需要额外文档;但若该行为会影响使用方(例如 cache manager 的构造方式),建议确认是否需要在相关说明中补一句。

Comment on lines +625 to +629
if (
layer_id is not None
and self.window_attn_skip_freq is not None
and self.window_attn_skip_freq[layer_id] == 1
):
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-23 21:44:21

📋 Review 摘要

PR 概述:修复 MLA cache shape 计算在无 layer_id 临时 backend 上的异常。
变更范围fastdeploy/model_executor/layers/attention/mla_attention_backend.py
影响面 Tag[OP] [KVCache]

问题

级别 文件 概述
🔴 Bug fastdeploy/model_executor/layers/attention/mla_attention_backend.py:626 layer_id 时默认普通 MLA shape,会导致 window_attn_skip_freq 层的 KV cache 维度被低估

📝 PR 规范检查

标题缺少官方 Tag,且 ModificationsUsage or CommandAccuracy Tests 仍为空/占位内容。可直接替换为:

标题建议(可直接复制):

  • [BugFix] Fix MLA cache shape init without layer_id
PR 描述建议(点击展开,可直接复制)
## Motivation
cache manager 临时构造 MLA attention backend 只用于计算 cache shape 时没有 `layer_id`,原逻辑直接访问 `self.layer_id` 会报错。

## Modifications
-`MLAAttentionBackend.get_kv_cache_shape` 中通过 `getattr(self, "layer_id", None)` 读取 layer id。
- 仅在存在 `layer_id` 且当前层命中 `window_attn_skip_freq` 时使用 SWA/FP8 MLA cache shape。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

当前改动能避免 AttributeError,但会把启用 window_attn_skip_freq 的 MLA cache 初始化路径变成静默的 shape 低估,建议修复后再合入。

value_cache_shape = []
if self.window_attn_skip_freq is not None and self.window_attn_skip_freq[layer_id] == 1:
if (
layer_id is not None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug layer_id is None 时直接跳过 window_attn_skip_freq 会把含 SWA/窗口层的 MLA cache shape 降成普通 MLA shape。

PrefixCacheManager._get_kv_cache_shape 会临时构造 backend 后立即调用 get_kv_cache_shape,这类实例没有 layer_id;V1 CacheController.initialize_kv_cache 也会在旧路径逐层设置 attn_backend.layer_id 之前调用 create_kv_cache。当前分支会返回 kv_lora_rank + qk_rope_head_dim 的普通 shape,但 runner 对 window_attn_skip_freq[i] == 1 的层使用 kv_lora_rank + 4 * (kv_lora_rank // 128) + 2 * qk_rope_head_dim 并按 uint8 cache attach/创建。这样 PR 虽然避免了 AttributeError,却会给窗口层分配或传递偏小的 IPC/cache shape,后续 attach 或 attention kernel 可能出现 shape mismatch 或越界读写。

建议修复方式:
让无 layer_id 的调用不要默认走普通 MLA shape。可以让 cache manager/cache controller 在调用前按层设置/传入 layer_id 并逐层取 shape;如果该接口必须返回单一 shape,则在 layer_id is None and any(window_attn_skip_freq) 时返回覆盖所有层的保守最大 shape,并同步 dtype/IPC 分配策略。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants