Bugfix: Cutlass FP8 FusedMoE bad scaling factors #27255

amirkl94 · 2025-10-21T08:26:44Z

Purpose

When running cutlass FusedMoE FP8 the scaling factors that are passed are None. This PR passes the correct scaling factors and enables the relevant test.

Test Plan

Enabled previously disabled test_flashinfer_cutlass_moe_fp8_no_graph .

When running cutlass FusedMoE FP8 the scaling factors that are passed are None. This PR passes the correct scaling factors and enables the relevant test. Signed-off-by: Amir Klein <[email protected]>

gemini-code-assist

Code Review

This pull request correctly addresses a bug in the Cutlass FP8 FusedMoE implementation by passing the necessary scaling factors. The changes are logical and enabling the previously skipped test test_flashinfer_cutlass_moe_fp8_no_graph validates the fix. However, I've identified a critical risk of a division-by-zero error in the calculation of the a2_gscale factor, which should be addressed to ensure numerical stability.

gemini-code-assist · 2025-10-21T08:28:12Z

vllm/model_executor/layers/quantization/modelopt.py

            a1_scale=layer.w13_input_scale,
+            a1_gscale=layer.w13_input_scale,
            a2_scale=layer.w2_input_scale,
+            a2_gscale=1.0 / layer.w2_input_scale,


The calculation 1.0 / layer.w2_input_scale introduces a risk of a division-by-zero error if layer.w2_input_scale is zero. Although scales are typically positive, adding a small epsilon to the denominator is a crucial safeguard for numerical stability.

Suggested change

a2_gscale=1.0 / layer.w2_input_scale,

a2_gscale=1.0 / (layer.w2_input_scale + 1e-6),

gemini-code-assist · 2025-10-21T08:28:12Z

tests/kernels/moe/test_flashinfer.py

            a1_scale=td.a1_scale,
+            a1_gscale=td.a1_scale,
            a2_scale=td.a2_scale,
+            a2_gscale=1.0 / td.a2_scale,


To prevent potential division-by-zero errors and for consistency with the recommended fix in the main logic, it's safer to add a small epsilon to the denominator here. While td.a2_scale is currently 1.0 in this test, this change improves the robustness of the test suite against future modifications.

Suggested change

a2_gscale=1.0 / td.a2_scale,

a2_gscale=1.0 / (td.a2_scale + 1e-6),

Signed-off-by: Amir Klein <[email protected]>

amirkl94 · 2025-11-02T12:03:30Z

@tlrmchlsmth Looks like CI failed due to something unrelated to my PR

Signed-off-by: Amir Klein <[email protected]>

amirkl94 · 2025-11-02T14:15:52Z

@wenscarl This reverts a change you made in #27223 can you please take a look?

mgoin

LGTM, please validate @wenscarl

wenscarl · 2025-11-04T13:36:54Z

vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py

            a1, topk_weights, topk_ids, apply_router_weight_on_input
        )
-        if not self.use_dp:
+        if not self.use_dp and quant_config.quant_dtype != torch.float8_e4m3fn:


I suggest using a more descriptive condition, for example:

if not self.use_dp and quant_config.quant_dtype == nvfp4: return a1, None, ...,

wenscarl · 2025-11-04T13:40:21Z

vllm/model_executor/layers/fused_moe/config.py

    per_act_token_quant: bool = False,
    per_out_ch_quant: bool = False,
    block_shape: list[int] | None = None,
+    a1_gscale: torch.Tensor | None = None,


The additional 4 scales were not present before 27223 and no issue without them. Is it possible to deduce them from others?

As far as I could tell, no

Signed-off-by: Amir Klein <[email protected]>

wenscarl · 2025-11-04T15:48:08Z

vllm/model_executor/layers/quantization/modelopt.py


        return fp8_w8a8_moe_quant_config(
            w1_scale=layer.w13_weight_scale,
+            g1_alphas=(layer.w13_weight_scale * layer.w13_input_scale).squeeze(),


The g1_alphas can be computed by w1_scale and a1_scale here, right? Same in test_flashinfer.py.

I'd rather not move this calculation inside the function as I'm not sure if in the future other paths will require different factors. If it's important then sure I can move it to be calculated in fp8_w8a8_moe_quant_config .
@wenscarl wdyt?

Sound reasonable. Let's leave it as it is.

Bugfix: Cutlass FP8 FusedMoE

5bcd35d

When running cutlass FusedMoE FP8 the scaling factors that are passed are None. This PR passes the correct scaling factors and enables the relevant test. Signed-off-by: Amir Klein <[email protected]>

amirkl94 requested review from WoosukKwon, mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 21, 2025 08:26

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Oct 24, 2025

Merge branch 'main' into bugfix/cutlass-fused-moe

c45fbe4

tlrmchlsmth requested a review from pavanimajety as a code owner October 30, 2025 20:51

tlrmchlsmth approved these changes Oct 30, 2025

View reviewed changes

Merge branch 'main' into bugfix/cutlass-fused-moe

4925add

Signed-off-by: Amir Klein <[email protected]>

amirkl94 requested a review from tlrmchlsmth November 2, 2025 09:38

Fix breakage after merge

7277cd4

Signed-off-by: Amir Klein <[email protected]>

Merge branch 'main' into bugfix/cutlass-fused-moe

935fb9e

mgoin changed the title ~~Bugfix: Cutlass FP8 FusedMoE~~ Bugfix: Cutlass FP8 FusedMoE bad scaling factors Nov 4, 2025

mgoin approved these changes Nov 4, 2025

View reviewed changes

wenscarl reviewed Nov 4, 2025

View reviewed changes

CR

d7932ce

Signed-off-by: Amir Klein <[email protected]>

amirkl94 requested a review from wenscarl November 4, 2025 15:31

wenscarl reviewed Nov 4, 2025

View reviewed changes

mgoin merged commit 6b7a811 into vllm-project:main Nov 5, 2025
56 checks passed

	a2_gscale=1.0 / layer.w2_input_scale,
	a2_gscale=1.0 / (layer.w2_input_scale + 1e-6),

	a2_gscale=1.0 / td.a2_scale,
	a2_gscale=1.0 / (td.a2_scale + 1e-6),

Uh oh!

Bugfix: Cutlass FP8 FusedMoE bad scaling factors #27255

Bugfix: Cutlass FP8 FusedMoE bad scaling factors #27255

Conversation

amirkl94 commented Oct 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

amirkl94 commented Nov 2, 2025

Uh oh!

amirkl94 commented Nov 2, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

wenscarl Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

wenscarl Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

amirkl94 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

wenscarl Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amirkl94 Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

wenscarl Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amirkl94 commented Oct 21, 2025 •

edited by github-actions bot

Loading

wenscarl Nov 4, 2025 •

edited

Loading