Fix MLA cache shape init without layerid by Socratesa · Pull Request #8072 · PaddlePaddle/FastDeploy

Socratesa · 2026-06-23T13:26:13Z

Motivation

cache manager 临时构造 backend 只为了算 cache shape，没有 layer_id，self.layerid 报错

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Signed-off-by: Socratesa <lihaode@zju.edu.cn>

Copilot

Pull request overview

该 PR 修复了在 Cache Manager 为了计算 MLA KV cache shape 而“临时构造 backend”时可能缺失 layer_id，从而导致 get_kv_cache_shape() 访问 self.layer_id 报错的问题，使该函数在缺少 layer_id 的场景下能够继续工作。

Changes:

在 get_kv_cache_shape() 中将 self.layer_id 改为通过 getattr(self, "layer_id", None) 获取，避免属性不存在时抛异常。
仅在 layer_id 存在时才使用 window_attn_skip_freq[layer_id] 的分支判断。

PR 标题/描述检查（需要补充）

标题缺少规定的 tag，建议改为：[BugFix] Fix MLA cache shape init without layerid
描述中的 Modifications / Usage or Command / Accuracy Tests 基本为空；若无测试/命令/精度结果，建议在对应小节说明原因与验证方式。
该改动属于 bugfix，通常不需要额外文档；但若该行为会影响使用方（例如 cache manager 的构造方式），建议确认是否需要在相关说明中补一句。

+        if (
+            layer_id is not None
+            and self.window_attn_skip_freq is not None
+            and self.window_attn_skip_freq[layer_id] == 1
+        ):


CLAassistant · 2026-06-23T13:32:44Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-23 21:44:21

📋 Review 摘要

PR 概述：修复 MLA cache shape 计算在无 layer_id 临时 backend 上的异常。
变更范围：fastdeploy/model_executor/layers/attention/mla_attention_backend.py
影响面 Tag：[OP] [KVCache]

问题

级别	文件	概述
🔴 Bug	`fastdeploy/model_executor/layers/attention/mla_attention_backend.py:626`	无 `layer_id` 时默认普通 MLA shape，会导致 `window_attn_skip_freq` 层的 KV cache 维度被低估

📝 PR 规范检查

标题缺少官方 Tag，且 Modifications、Usage or Command、Accuracy Tests 仍为空/占位内容。可直接替换为：

标题建议（可直接复制）：

[BugFix] Fix MLA cache shape init without layer_id

PR 描述建议（点击展开，可直接复制）

## Motivation
cache manager 临时构造 MLA attention backend 只用于计算 cache shape 时没有 `layer_id`，原逻辑直接访问 `self.layer_id` 会报错。

## Modifications
- 在 `MLAAttentionBackend.get_kv_cache_shape` 中通过 `getattr(self, "layer_id", None)` 读取 layer id。
- 仅在存在 `layer_id` 且当前层命中 `window_attn_skip_freq` 时使用 SWA/FP8 MLA cache shape。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

当前改动能避免 AttributeError，但会把启用 window_attn_skip_freq 的 MLA cache 初始化路径变成静默的 shape 低估，建议修复后再合入。

PaddlePaddle-bot · 2026-06-23T13:46:17Z

        value_cache_shape = []
-        if self.window_attn_skip_freq is not None and self.window_attn_skip_freq[layer_id] == 1:
+        if (
+            layer_id is not None


🔴 Bug layer_id is None 时直接跳过 window_attn_skip_freq 会把含 SWA/窗口层的 MLA cache shape 降成普通 MLA shape。

PrefixCacheManager._get_kv_cache_shape 会临时构造 backend 后立即调用 get_kv_cache_shape，这类实例没有 layer_id；V1 CacheController.initialize_kv_cache 也会在旧路径逐层设置 attn_backend.layer_id 之前调用 create_kv_cache。当前分支会返回 kv_lora_rank + qk_rope_head_dim 的普通 shape，但 runner 对 window_attn_skip_freq[i] == 1 的层使用 kv_lora_rank + 4 * (kv_lora_rank // 128) + 2 * qk_rope_head_dim 并按 uint8 cache attach/创建。这样 PR 虽然避免了 AttributeError，却会给窗口层分配或传递偏小的 IPC/cache shape，后续 attach 或 attention kernel 可能出现 shape mismatch 或越界读写。

建议修复方式：
让无 layer_id 的调用不要默认走普通 MLA shape。可以让 cache manager/cache controller 在调用前按层设置/传入 layer_id 并逐层取 shape；如果该接口必须返回单一 shape，则在 layer_id is None and any(window_attn_skip_freq) 时返回覆盖所有层的保守最大 shape，并同步 dtype/IPC 分配策略。

Fix MLA cache shape init without layerid

6fd9dc5

Signed-off-by: Socratesa <lihaode@zju.edu.cn>

Copilot AI review requested due to automatic review settings June 23, 2026 13:26

Socratesa had a problem deploying to Metax_ci June 23, 2026 13:26 — with GitHub Actions Failure

Copilot started reviewing on behalf of Socratesa June 23, 2026 13:26 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread fastdeploy/model_executor/layers/attention/mla_attention_backend.py

Comment on lines +625 to +629

if (

layer_id is not None

and self.window_attn_skip_freq is not None

and self.window_attn_skip_freq[layer_id] == 1

):

PaddlePaddle-bot suggested changes Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix MLA cache shape init without layerid#8072

Fix MLA cache shape init without layerid#8072
Socratesa wants to merge 1 commit into
PaddlePaddle:developfrom
Socratesa:mla_fix

Socratesa commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

CLAassistant commented Jun 23, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Socratesa commented Jun 23, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

CLAassistant commented Jun 23, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants