skip unsupported arches in python tests #5760

liqiangxl · 2026-01-05T16:25:16Z

skip unsupported arches

liqiangxl · 2026-01-05T16:25:29Z

!test

greptile-apps · 2026-01-05T16:28:11Z

Greptile Summary

This PR restricts FP4 quantization and GEMM tests to specific GPU architectures. A new is_blackwell() helper was added to detect multiple Blackwell variants (10.0, 10.3, 12.0, 12.1), and skip conditions were updated across test files.

Key changes:

Replaced module-level architecture checks with per-test @pytest.mark.skipif decorators using microarchitecture_is(10, 0)
Tests now skip on unsupported architectures with clear reason messages
Consolidated redundant skip conditions (is_pre_blackwell() + microarchitecture_is_pre(12) → microarchitecture_is(10, 0))

Main issue:
Inconsistent architecture filtering in test_narrow_precision.py: some tests use is_blackwell() (supports all Blackwell: 10.0, 10.3, 12.0, 12.1) while others use microarchitecture_is(10, 0) (only 10.0). This creates different test coverage across Blackwell variants within the same file. Need clarification on whether certain operations (like scaled_mm and cutlass_nvfp4_grouped_mm) have specific incompatibilities with 12.x architectures that quantization tests don't have.

Confidence Score: 3/5

This PR is functionally safe but has architectural inconsistencies that may cause confusion
The changes correctly restrict tests to supported architectures and won't cause test failures. However, the mixed use of is_blackwell() vs microarchitecture_is(10, 0) in the same file creates inconsistent test coverage across Blackwell variants. This suggests either: (1) an intentional distinction that should be documented, or (2) an oversight where some tests could run on more architectures. The score reflects uncertainty about whether the architecture filtering is correctly applied.
Pay close attention to tests/python/direct/test_narrow_precision.py - verify that the mixed architecture checks (lines 66, 173, 240, 320, 449, 609, 652) are intentional

Important Files Changed

Filename	Overview
tests/python/direct_utils/utils.py	Added new `is_blackwell()` helper function to detect Blackwell architecture variants (10.0, 10.3, 12.0, 12.1)
tests/python/direct/test_cutlass_nvfp4_gemm.py	Replaced module-level skip with per-test `microarchitecture_is(10, 0)` checks to restrict tests to compute capability 10.0 only
tests/python/direct/test_with_id_model_indexer.py	Simplified skip condition to only use `microarchitecture_is(10, 0)` instead of combining `is_pre_blackwell()` and `microarchitecture_is_pre(12)`
tests/python/direct/test_narrow_precision.py	Mixed approach: some tests use `is_blackwell()` (supports 10.0, 10.3, 12.0, 12.1), while others use `microarchitecture_is(10, 0)` (only 10.0), creating inconsistent architecture support

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant Utils as direct_utils/utils.py
    participant Device as CUDA Device
    
    Note over Test,Device: Test Execution Flow
    
    Test->>Utils: Check architecture support
    alt Using microarchitecture_is(10, 0)
        Utils->>Device: Get device properties (major, minor)
        Device-->>Utils: Return (10, 0) or other
        Utils-->>Test: Return True only if exactly 10.0
        Note over Test: Skip if not 10.0<br/>(test_cutlass_nvfp4_gemm.py,<br/>test_with_id_model_indexer.py,<br/>some tests in test_narrow_precision.py)
    else Using is_blackwell()
        Utils->>Device: Check multiple compute capabilities
        Device-->>Utils: Return compute capability
        Utils-->>Test: Return True if 10.0, 10.3, 12.0, or 12.1
        Note over Test: Skip if not any Blackwell variant<br/>(some tests in test_narrow_precision.py)
    end
    
    alt Architecture supported
        Test->>Test: Execute test (FP4 quantization, GEMM, etc.)
    else Architecture not supported
        Test->>Test: Skip test with reason
    end

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-05T16:28:11Z

tests/python/direct/test_narrow_precision.py

-@pytest.mark.skipif(
-    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
-)
+@pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.")


logic: Inconsistent architecture filtering: this test uses is_blackwell() (supports 10.0, 10.3, 12.0, 12.1) while other tests in this same file use microarchitecture_is(10, 0) (only supports 10.0). This creates different test coverage across Blackwell variants.

github-actions · 2026-01-05T16:30:29Z

Review updated until commit 55b5572

Description

Updated architecture checking logic from "skip if pre-blackwell" to "skip if not blackwell"
Added new is_blackwell() utility function supporting multiple Blackwell architectures (10.0, 10.3, 12.0, 12.1)
Modified test skip conditions across multiple test files to properly handle unsupported architectures
Replaced module-level skips with function-level @pytest.mark.skipif decorators for better test granularity

Changes walkthrough

Relevant files

Tests

test_cutlass_nvfp4_gemm.py `Update architecture checks for NVFP4 GEMM tests` tests/python/direct/test_cutlass_nvfp4_gemm.py Removed module-level compute capability check Added `microarchitecture_is` import Added function-level skip decorators using `microarchitecture_is(10, 0)` Updated skip reason to clarify Blackwell 12.0 support limitations	+13/-8
test_narrow_precision.py `Update architecture validation for narrow precision tests` tests/python/direct/test_narrow_precision.py Replaced `is_pre_blackwell` and `microarchitecture_is_pre` imports with `is_blackwell` and `microarchitecture_is` Updated multiple test skip conditions from "skip if pre-blackwell" to "skip if not blackwell" Modified narrow precision test architecture validation logic	+13/-27
test_with_id_model_indexer.py `Update architecture checks for model indexer test` tests/python/direct/test_with_id_model_indexer.py Updated imports to use `microarchitecture_is` instead of `is_pre_blackwell` and `microarchitecture_is_pre` Changed skip condition from pre-blackwell check to `microarchitecture_is(10, 0)` Maintained existing test configuration and parameters	+3/-6

Enhancement

utils.py `Add Blackwell architecture detection utility` tests/python/direct_utils/utils.py Added new `is_blackwell()` function supporting multiple Blackwell architectures Function checks for compute capabilities 10.0, 10.3, 12.0, and 12.1 Provides comprehensive Blackwell architecture detection	+13/-0

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Architecture Check Logic The PR replaces the old compute capability check with a new `microarchitecture_is(10, 0)` check. Verify that this change correctly identifies the intended architectures and that the skip reason "Does not support blackwell compute 12.0, other arches are not tested" accurately reflects the test's requirements. @pytest.mark.skipif( not microarchitecture_is(10, 0), reason="Does not support blackwell compute 12.0, other arches are not tested.", ) Blackwell-specific Tests Tests that were previously skipped on "pre-Blackwell" architectures are now being skipped on non-Blackwell architectures using `not is_blackwell()`. Ensure this change doesn't inadvertently enable tests on unsupported hardware or disable tests on supported hardware. @pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.") @pytest.mark.parametrize("dtype", [torch.float32, torch.bfloat16]) New is_blackwell Function A new `is_blackwell()` function is added that checks for multiple Blackwell architectures (10.0, 10.3, 12.0, 12.1). Verify that this comprehensive check covers all intended Blackwell architectures and doesn't include unintended ones. def is_blackwell(): return ( microarchitecture_is(10, 0) or microarchitecture_is(10, 3) or microarchitecture_is(12, 0) or microarchitecture_is(12, 1) )

Test failures

(Low, 1) Minor numerical mismatch in Thunder vs Torch instance_norm cuda tests

Test Name A100 Source

thunder.tests.test_ops.test_core_vs_torch_consistency_instance_norm_nvfuser_cuda_thunder.dtypes.float32 ❌

naoyam · 2026-01-05T18:51:05Z

The title suggests this has something to do with indexing. Can you clarify what it is?

liqiangxl · 2026-01-06T14:25:58Z

The title suggests this has something to do with indexing. Can you clarify what it is?

The title was not accurate, failure is not realated to indexing. Revised to skip unsupported arches in test test_with_id_model_indexer.py``

liqiangxl · 2026-01-06T14:31:03Z

@greptileai review this draft

greptile-apps

_{4 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-06T14:34:34Z

tests/python/direct/test_narrow_precision.py

 @pytest.mark.skipif(
-    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
-)
-@pytest.mark.skipif(
-    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
+    not microarchitecture_is(10, 0),
+    reason="Does not support blackwell compute 12.0, other arches are not tested.",
 )


style: Inconsistent architecture filtering within same file: this test uses microarchitecture_is(10, 0) (only supports 10.0), but other tests in this file (lines 66, 173, 320) use is_blackwell() which supports all Blackwell variants (10.0, 10.3, 12.0, 12.1). Are the _scaled_mm tests specifically incompatible with 12.x architectures while quantization tests support them? Do scaled_mm operations have specific incompatibilities with 12.x compute capabilities that quantization operations don't have?

greptile-apps · 2026-01-06T14:34:35Z

tests/python/direct/test_narrow_precision.py

 @pytest.mark.skipif(
-    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
-)
-@pytest.mark.skipif(
-    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
+    not microarchitecture_is(10, 0),
+    reason="Does not support blackwell compute 12.0, other arches are not tested.",
 )


style: Same inconsistency: microarchitecture_is(10, 0) here vs is_blackwell() at line 320 for similar grouped matmul tests

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-01-06T14:34:35Z

tests/python/direct/test_narrow_precision.py

 @pytest.mark.skipif(
-    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
-)
-@pytest.mark.skipif(
-    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
+    not microarchitecture_is(10, 0),
+    reason="Does not support blackwell compute 12.0, other arches are not tested.",
 )


style: Same pattern: microarchitecture_is(10, 0) here vs is_blackwell() elsewhere in file

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-01-06T14:34:36Z

tests/python/direct/test_narrow_precision.py

 @pytest.mark.skipif(
-    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
-)
-@pytest.mark.skipif(
-    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
+    not microarchitecture_is(10, 0),
+    reason="Does not support blackwell compute 12.0, other arches are not tested.",
 )


style: Same pattern: microarchitecture_is(10, 0) here vs is_blackwell() elsewhere in file

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

liqiangxl · 2026-01-06T14:47:04Z

!test

liqiangxl · 2026-01-06T15:46:48Z

Summary of failure reasons:
The first four tests are failed in the process of computing the reference output using PyTorch. The last test is due to nvFuser.

Test: pytest tests/python/direct/test_cutlass_gemm.py -k "bfloat16-tokens_per_expert_neg_one and 115 and 1024"
Err: Attempting to use TMEM allocation PTX without CUTE_ARCH_TCGEN05_TMEM_ENABLED
Reason: caused by torch, torch._grouped_mm
Test: pytest tests/python/direct/test_cutlass_nvfp4_gemm.py -k "gemm[shape and 256 and 64 and bfloat16"
Err: Trying to use TMA Descriptor Prefetch without CUTE_ARCH_TMA_SM90_ENABLED
Reason: caused by torch
Test pytest tests/python/direct/test_cutlass_nvfp4_gemm.py -k "grouped_mm and bfloat"
Err: Attempting to use TMEM allocation PTX without CUTE_ARCH_TCGEN05_TMEM_ENABLED
Reason: caused by torch
Test pytest tests/python/direct/test_narrow_precision.py -k "test_nv_quantization_to_mxfp8 and float16 and eager"
Err: CUDA Error: the provided PTX was compiled with an unsupported toolchain
Reason: TransformerEngine error
Test NVFUSER_DISABLE=parallel_compile pytest tests/python/direct/test_with_id_model_indexer.py -k "eager"
Err: Attempting to use TMEM allocation PTX without CUTE_ARCH_TCGEN05_TMEM_ENABLED and INTERNAL ASSERT FAILED at /opt/pytorch/nvfuser/csrc/runtime/executor_utils.cpp:559
Reason: seems nvFuser related.

liqiangxl added 4 commits January 5, 2026 07:15

fix test_grouped_mm

4b7d91b

skip nvfp4

235f161

fix narrow precision tests

40f0ef7

fix test_with_id_model_indexer

f9fdc32

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

liqiangxl changed the title ~~fix test with id model indexer~~ skip unsupported arches in test test_with_id_model_indexer.py Jan 6, 2026

liqiangxl marked this pull request as draft January 6, 2026 14:30

Merge branch 'main' into llu/fix_test_with_id_model_indexer

55b5572

greptile-apps bot reviewed Jan 6, 2026

View reviewed changes

liqiangxl changed the title ~~skip unsupported arches in test test_with_id_model_indexer.py~~ skip unsupported arches in python tests Jan 6, 2026

skip unsupported arches in python tests #5760

Are you sure you want to change the base?

skip unsupported arches in python tests #5760

Uh oh!

Conversation

liqiangxl commented Jan 5, 2026

Uh oh!

liqiangxl commented Jan 5, 2026

Uh oh!

greptile-apps bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 5, 2026 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

naoyam commented Jan 5, 2026

Uh oh!

liqiangxl commented Jan 6, 2026

Uh oh!

liqiangxl commented Jan 6, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

liqiangxl commented Jan 6, 2026

Uh oh!

liqiangxl commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Jan 5, 2026 •

edited

Loading

github-actions bot commented Jan 5, 2026 •

edited by xwang233

Loading