[NVBug 6045859]Fix export support for Qwen3VL MoE experts#1164
[NVBug 6045859]Fix export support for Qwen3VL MoE experts#1164shengliangxu wants to merge 4 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughAdded recognition of a new Qwen3 MoE expert module type ( Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1164 +/- ##
===========================================
- Coverage 74.76% 63.53% -11.24%
===========================================
Files 351 351
Lines 40072 40084 +12
===========================================
- Hits 29961 25468 -4493
- Misses 10111 14616 +4505
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
daf0144 to
de55e8a
Compare
…ontainers
Qwen3VLMoeTextExperts stored expert weights as flat ModuleLists
(gate_proj, up_proj, down_proj), making the module non-iterable. The HF
export code requires `sub_module.experts` to be iterable, causing a
NotImplementedError during `export_hf_checkpoint`.
Refactor _QuantQwen3VLMoeTextExperts to use per-expert module
containers (matching the _QuantQwen35MoeExperts pattern):
- Add _Qwen3VLMoeExpertModule container class
- Register experts as numbered children (experts.{id}.gate_proj.weight)
- Implement __len__/__iter__/__getitem__ for iterability
- Add Qwen3VLMoeSparseMoeBlock to get_expert_linear_names
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/quantization/plugins/huggingface.py (1)
690-701: Consider unifying with_Qwen35MoeExpertModule.This class is nearly identical to
_Qwen35MoeExpertModule(lines 792-803), differing only in parameter naming (hidden_sizevshidden_dim). Consider creating a single reusable expert module class to reduce code duplication.♻️ Proposed unified expert module
-class _Qwen3VLMoeExpertModule(nn.Module): - """Container for a single Qwen3VL MoE expert's linear layers. - - Produces the naming pattern: experts.{id}.gate_proj.weight - (consistent with standard Qwen3 MoE per-expert module structure). - """ - - def __init__(self, hidden_size: int, expert_dim: int): - super().__init__() - self.gate_proj = nn.Linear(hidden_size, expert_dim, bias=False) - self.up_proj = nn.Linear(hidden_size, expert_dim, bias=False) - self.down_proj = nn.Linear(expert_dim, hidden_size, bias=False) +class _QwenMoeExpertModule(nn.Module): + """Container for a single Qwen MoE expert's linear layers. + + Produces the naming pattern: experts.{id}.gate_proj.weight + (consistent with standard Qwen MoE per-expert module structure). + Reusable for Qwen3VL, Qwen3.5, and similar variants. + """ + + def __init__(self, hidden_dim: int, expert_dim: int): + super().__init__() + self.gate_proj = nn.Linear(hidden_dim, expert_dim, bias=False) + self.up_proj = nn.Linear(hidden_dim, expert_dim, bias=False) + self.down_proj = nn.Linear(expert_dim, hidden_dim, bias=False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/plugins/huggingface.py` around lines 690 - 701, These two nearly identical classes (_Qwen3VLMoeExpertModule and _Qwen35MoeExpertModule) should be replaced with one reusable expert module (e.g. _QwenMoeExpertModule) that accepts the common parameters (hidden_dim/hidden_size -> hidden_dim, expert_dim) and exposes gate_proj, up_proj, down_proj with the same naming pattern (experts.{id}.gate_proj.weight); update all instantiations that referenced _Qwen3VLMoeExpertModule and _Qwen35MoeExpertModule to use the new class and normalize the parameter name to hidden_dim to remove duplication while preserving behavior and bias=False on the Linear layers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/quantization/plugins/huggingface.py`:
- Around line 690-701: These two nearly identical classes
(_Qwen3VLMoeExpertModule and _Qwen35MoeExpertModule) should be replaced with one
reusable expert module (e.g. _QwenMoeExpertModule) that accepts the common
parameters (hidden_dim/hidden_size -> hidden_dim, expert_dim) and exposes
gate_proj, up_proj, down_proj with the same naming pattern
(experts.{id}.gate_proj.weight); update all instantiations that referenced
_Qwen3VLMoeExpertModule and _Qwen35MoeExpertModule to use the new class and
normalize the parameter name to hidden_dim to remove duplication while
preserving behavior and bias=False on the Linear layers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d09e8137-0fe2-4503-be18-2045798557e7
📒 Files selected for processing (2)
modelopt/torch/export/layer_utils.pymodelopt/torch/quantization/plugins/huggingface.py
What does this PR do?
Fix HF checkpoint export support for Qwen3-VL MoE models (e.g.
Qwen/Qwen3-VL-30B-A3B-Instruct).Previously, running
hf_ptq.pyon Qwen3-VL MoE models failed duringexport_hf_checkpointwith:Root cause:
_QuantQwen3VLMoeTextExpertsstored expert weights as flatnn.ModuleLists (one per projection type), making the module non-iterable. The export code requiressub_module.expertsto be iterable to handle input quantizer amax and gate/up amax sync.Fix: Refactor
_QuantQwen3VLMoeTextExpertsto use per-expert module containers, matching the established_QuantQwen35MoeExpertspattern:_Qwen3VLMoeExpertModulecontainer class withgate_proj,up_proj,down_projexperts.{id}.gate_proj.weight(standard Qwen3 MoE naming, compatible with vLLM)__len__/__iter__/__getitem__for iterabilityQwen3VLMoeSparseMoeBlocktoget_expert_linear_namesinlayer_utils.pyFiles changed:
modelopt/torch/quantization/plugins/huggingface.py— refactored expert module structuremodelopt/torch/export/layer_utils.py— added Qwen3VLMoe to expert linear name mappingTesting
Qwen/Qwen3-VL-30B-A3B-Instruct:Summary by CodeRabbit
New Features
Improvements