-
Notifications
You must be signed in to change notification settings - Fork 753
Fix MLA cache shape init without layerid #8072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Socratesa
wants to merge
1
commit into
PaddlePaddle:develop
Choose a base branch
from
Socratesa:mla_fix
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Bug
layer_id is None时直接跳过window_attn_skip_freq会把含 SWA/窗口层的 MLA cache shape 降成普通 MLA shape。PrefixCacheManager._get_kv_cache_shape会临时构造 backend 后立即调用get_kv_cache_shape,这类实例没有layer_id;V1CacheController.initialize_kv_cache也会在旧路径逐层设置attn_backend.layer_id之前调用create_kv_cache。当前分支会返回kv_lora_rank + qk_rope_head_dim的普通 shape,但 runner 对window_attn_skip_freq[i] == 1的层使用kv_lora_rank + 4 * (kv_lora_rank // 128) + 2 * qk_rope_head_dim并按uint8cache attach/创建。这样 PR 虽然避免了 AttributeError,却会给窗口层分配或传递偏小的 IPC/cache shape,后续 attach 或 attention kernel 可能出现 shape mismatch 或越界读写。建议修复方式:
让无
layer_id的调用不要默认走普通 MLA shape。可以让 cache manager/cache controller 在调用前按层设置/传入layer_id并逐层取 shape;如果该接口必须返回单一 shape,则在layer_id is None and any(window_attn_skip_freq)时返回覆盖所有层的保守最大 shape,并同步 dtype/IPC 分配策略。