[NVBug 6045859]Fix export support for Qwen3VL MoE experts by shengliangxu · Pull Request #1164 · NVIDIA/Model-Optimizer

shengliangxu · 2026-04-01T23:44:35Z

What does this PR do?

Fix HF checkpoint export support for Qwen3-VL MoE models (e.g. Qwen/Qwen3-VL-30B-A3B-Instruct).

Previously, running hf_ptq.py on Qwen3-VL MoE models failed during export_hf_checkpoint with:

NotImplementedError: MoE model with experts type 'QuantQwen3VLMoeTextExperts' is not supported in export.

Root cause: _QuantQwen3VLMoeTextExperts stored expert weights as flat nn.ModuleLists (one per projection type), making the module non-iterable. The export code requires sub_module.experts to be iterable to handle input quantizer amax and gate/up amax sync.

Fix: Refactor _QuantQwen3VLMoeTextExperts to use per-expert module containers, matching the established _QuantQwen35MoeExperts pattern:

Add _Qwen3VLMoeExpertModule container class with gate_proj, up_proj, down_proj
Register experts as numbered children producing state dict keys like experts.{id}.gate_proj.weight (standard Qwen3 MoE naming, compatible with vLLM)
Implement __len__/__iter__/__getitem__ for iterability
Add Qwen3VLMoeSparseMoeBlock to get_expert_linear_names in layer_utils.py

Files changed:

modelopt/torch/quantization/plugins/huggingface.py — refactored expert module structure
modelopt/torch/export/layer_utils.py — added Qwen3VLMoe to expert linear name mapping

Testing

[ x ] FP8 PTQ + export on Qwen/Qwen3-VL-30B-A3B-Instruct:

python examples/llm_ptq/hf_ptq.py \
  --pyt_ckpt_path=Qwen/Qwen3-VL-30B-A3B-Instruct \
  --export_path=<output_dir> \
  --qformat=fp8 --calib_size=8 --batch_size=1

[ x ] Verify exported checkpoint loads in vLLM

Summary by CodeRabbit

New Features
- Added support for an additional Qwen3 Vision Language Model Mixture of Experts variant.
Improvements
- Enhanced Mixture of Experts module structure and handling for improved performance.

copy-pr-bot · 2026-04-01T23:44:38Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-01T23:44:41Z

📝 Walkthrough

Walkthrough

Added recognition of a new Qwen3 MoE expert module type (Qwen3VLMoeTextSparseMoeBlock) to the export layer utilities. Refactored the quantization plugin's expert container structure to use a unified _Qwen3VLMoeExpertModule wrapper instead of separate ModuleLists, with updated forward pass routing and container protocol support.

Changes

Cohort / File(s)	Summary
MoE Expert Type Recognition `modelopt/torch/export/layer_utils.py`	Extended `get_expert_linear_names()` to recognize `Qwen3VLMoeTextSparseMoeBlock` module type and return standard Qwen expert linear layer names.
Expert Container Refactoring `modelopt/torch/quantization/plugins/huggingface.py`	Restructured `_QuantQwen3VLMoeTextExperts` to use a new `_Qwen3VLMoeExpertModule` container per expert instead of three parallel `ModuleList`s. Updated weight registration, forward pass indexing, and added `__len__`, `__iter__`, `__getitem__` methods for container protocol support.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns	✅ Passed	Pull request introduces only class refactoring for Qwen3-VL MoE expert module handling with no security anti-patterns detected.
Title check	✅ Passed	The title clearly describes the main change: fixing export support for Qwen3VL MoE experts, which is the central objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/qwen3vlmoe-export

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-01T23:48:39Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1164/
Built to branch `gh-pages` at 2026-04-03 22:52 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-04-01T23:56:41Z

Codecov Report

❌ Patch coverage is 20.00000% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.53%. Comparing base (df80a0f) to head (e949e33).

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/plugins/huggingface.py	20.00%	20 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1164       +/-   ##
===========================================
- Coverage   74.76%   63.53%   -11.24%     
===========================================
  Files         351      351               
  Lines       40072    40084       +12     
===========================================
- Hits        29961    25468     -4493     
- Misses      10111    14616     +4505

Flag	Coverage Δ
examples	`40.27% <20.00%> (-0.03%)`	⬇️
gpu	`18.76% <20.00%> (-38.47%)`	⬇️
unit	`54.74% <20.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ontainers Qwen3VLMoeTextExperts stored expert weights as flat ModuleLists (gate_proj, up_proj, down_proj), making the module non-iterable. The HF export code requires `sub_module.experts` to be iterable, causing a NotImplementedError during `export_hf_checkpoint`. Refactor _QuantQwen3VLMoeTextExperts to use per-expert module containers (matching the _QuantQwen35MoeExperts pattern): - Add _Qwen3VLMoeExpertModule container class - Register experts as numbered children (experts.{id}.gate_proj.weight) - Implement __len__/__iter__/__getitem__ for iterability - Add Qwen3VLMoeSparseMoeBlock to get_expert_linear_names Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/quantization/plugins/huggingface.py (1)

690-701: Consider unifying with _Qwen35MoeExpertModule.

This class is nearly identical to _Qwen35MoeExpertModule (lines 792-803), differing only in parameter naming (hidden_size vs hidden_dim). Consider creating a single reusable expert module class to reduce code duplication.

♻️ Proposed unified expert module

-class _Qwen3VLMoeExpertModule(nn.Module):
-    """Container for a single Qwen3VL MoE expert's linear layers.
-
-    Produces the naming pattern: experts.{id}.gate_proj.weight
-    (consistent with standard Qwen3 MoE per-expert module structure).
-    """
-
-    def __init__(self, hidden_size: int, expert_dim: int):
-        super().__init__()
-        self.gate_proj = nn.Linear(hidden_size, expert_dim, bias=False)
-        self.up_proj = nn.Linear(hidden_size, expert_dim, bias=False)
-        self.down_proj = nn.Linear(expert_dim, hidden_size, bias=False)
+class _QwenMoeExpertModule(nn.Module):
+    """Container for a single Qwen MoE expert's linear layers.
+
+    Produces the naming pattern: experts.{id}.gate_proj.weight
+    (consistent with standard Qwen MoE per-expert module structure).
+    Reusable for Qwen3VL, Qwen3.5, and similar variants.
+    """
+
+    def __init__(self, hidden_dim: int, expert_dim: int):
+        super().__init__()
+        self.gate_proj = nn.Linear(hidden_dim, expert_dim, bias=False)
+        self.up_proj = nn.Linear(hidden_dim, expert_dim, bias=False)
+        self.down_proj = nn.Linear(expert_dim, hidden_dim, bias=False)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/plugins/huggingface.py` around lines 690 - 701,
These two nearly identical classes (_Qwen3VLMoeExpertModule and
_Qwen35MoeExpertModule) should be replaced with one reusable expert module (e.g.
_QwenMoeExpertModule) that accepts the common parameters (hidden_dim/hidden_size
-> hidden_dim, expert_dim) and exposes gate_proj, up_proj, down_proj with the
same naming pattern (experts.{id}.gate_proj.weight); update all instantiations
that referenced _Qwen3VLMoeExpertModule and _Qwen35MoeExpertModule to use the
new class and normalize the parameter name to hidden_dim to remove duplication
while preserving behavior and bias=False on the Linear layers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/quantization/plugins/huggingface.py`:
- Around line 690-701: These two nearly identical classes
(_Qwen3VLMoeExpertModule and _Qwen35MoeExpertModule) should be replaced with one
reusable expert module (e.g. _QwenMoeExpertModule) that accepts the common
parameters (hidden_dim/hidden_size -> hidden_dim, expert_dim) and exposes
gate_proj, up_proj, down_proj with the same naming pattern
(experts.{id}.gate_proj.weight); update all instantiations that referenced
_Qwen3VLMoeExpertModule and _Qwen35MoeExpertModule to use the new class and
normalize the parameter name to hidden_dim to remove duplication while
preserving behavior and bias=False on the Linear layers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d09e8137-0fe2-4503-be18-2045798557e7

📥 Commits

Reviewing files that changed from the base of the PR and between 18ddcb7 and 998e258.

📒 Files selected for processing (2)

modelopt/torch/export/layer_utils.py
modelopt/torch/quantization/plugins/huggingface.py

shengliangxu closed this Apr 2, 2026

shengliangxu force-pushed the shengliangx/qwen3vlmoe-export branch from daf0144 to de55e8a Compare April 2, 2026 21:37

shengliangxu added 2 commits April 2, 2026 15:20

correct name

adc3817

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu reopened this Apr 2, 2026

shengliangxu marked this pull request as ready for review April 2, 2026 23:28

shengliangxu requested review from a team as code owners April 2, 2026 23:28

shengliangxu requested review from Edwardf0t1 and Fridah-nv April 2, 2026 23:28

Merge branch 'main' into shengliangx/qwen3vlmoe-export

998e258

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

shengliangxu changed the title ~~Add export support for Qwen3VL MoE experts with ModuleList linear layers~~ Fix export support for Qwen3VL MoE experts Apr 3, 2026

shengliangxu requested a review from cjluo-nv April 3, 2026 17:55

shengliangxu changed the title ~~Fix export support for Qwen3VL MoE experts~~ [NVBug 6045859]Fix export support for Qwen3VL MoE experts Apr 3, 2026

Merge branch 'main' into shengliangx/qwen3vlmoe-export

e949e33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVBug 6045859]Fix export support for Qwen3VL MoE experts#1164

[NVBug 6045859]Fix export support for Qwen3VL MoE experts#1164
shengliangxu wants to merge 4 commits intomainfrom
shengliangx/qwen3vlmoe-export

shengliangxu commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Apr 1, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-03 22:52 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shengliangxu commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-03 22:52 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shengliangxu commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

github-actions bot commented Apr 1, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-03 22:52 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov bot commented Apr 1, 2026 •

edited

Loading