[minor] fix for GLM4.7 mtp module in PTQ#1630
Conversation
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
/claude review |
|
PR changed again? Review this PR in Change Stack to compare snapshots and stay oriented. No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughUpdated ChangesTop-level model prefix exclusion in key conversion
Estimated code review effort: Suggested reviewers:
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1630 +/- ##
==========================================
- Coverage 77.38% 76.60% -0.78%
==========================================
Files 482 488 +6
Lines 52960 55032 +2072
==========================================
+ Hits 40984 42159 +1175
- Misses 11976 12873 +897
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
What does this PR do?
Type of change: Bug fix
_keys_to_prefixesnow drops a top-level"model"key fragment instead of emitting it as a prefix. Without this guard, an inlined-MTP key likemodel.layers.92.eh_proj.weightwould emit"model"→ exporter wraps it as"model*"inquantization_config.exclude_modules→fnmatchin TRT-LLM matches everymodel.layers.X.*module → entire backbone is treated as unquantized → FP8 weights get loaded into BF16 buffers via.view(bf16)(which halves the last dim of an FP8 tensor) → loader crashes withtensor a (5120) must match tensor b (2560) at non-singleton dimension 1.The main-branch caller in
load_mtp_weights(PR #1532) already filters inlined keys before invoking_keys_to_prefixes, so this guard is defense-in-depth onmaintoday. It freezes the invariant in code rather than as a docstring caveat — the same regression cannot return if a future caller forgets to filter. Adds a focused unit test that pins the behavior.Usage
# Add a code snippet demonstrating how to use thisTesting
tests/examples/llm_ptq/test_example_utils.py::test_keys_to_prefixes_drops_model_top_level: passes with this PR, would fail without the guard.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅Additional Information
Summary by CodeRabbit
Bug Fixes
Tests