Skip to content

Conversation

@liqiangxl
Copy link
Collaborator

skip unsupported arches

@liqiangxl
Copy link
Collaborator Author

!test

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

This PR restricts FP4 quantization and GEMM tests to specific GPU architectures. A new is_blackwell() helper was added to detect multiple Blackwell variants (10.0, 10.3, 12.0, 12.1), and skip conditions were updated across test files.

Key changes:

  • Replaced module-level architecture checks with per-test @pytest.mark.skipif decorators using microarchitecture_is(10, 0)
  • Tests now skip on unsupported architectures with clear reason messages
  • Consolidated redundant skip conditions (is_pre_blackwell() + microarchitecture_is_pre(12)microarchitecture_is(10, 0))

Main issue:
Inconsistent architecture filtering in test_narrow_precision.py: some tests use is_blackwell() (supports all Blackwell: 10.0, 10.3, 12.0, 12.1) while others use microarchitecture_is(10, 0) (only 10.0). This creates different test coverage across Blackwell variants within the same file. Need clarification on whether certain operations (like scaled_mm and cutlass_nvfp4_grouped_mm) have specific incompatibilities with 12.x architectures that quantization tests don't have.

Confidence Score: 3/5

  • This PR is functionally safe but has architectural inconsistencies that may cause confusion
  • The changes correctly restrict tests to supported architectures and won't cause test failures. However, the mixed use of is_blackwell() vs microarchitecture_is(10, 0) in the same file creates inconsistent test coverage across Blackwell variants. This suggests either: (1) an intentional distinction that should be documented, or (2) an oversight where some tests could run on more architectures. The score reflects uncertainty about whether the architecture filtering is correctly applied.
  • Pay close attention to tests/python/direct/test_narrow_precision.py - verify that the mixed architecture checks (lines 66, 173, 240, 320, 449, 609, 652) are intentional

Important Files Changed

Filename Overview
tests/python/direct_utils/utils.py Added new is_blackwell() helper function to detect Blackwell architecture variants (10.0, 10.3, 12.0, 12.1)
tests/python/direct/test_cutlass_nvfp4_gemm.py Replaced module-level skip with per-test microarchitecture_is(10, 0) checks to restrict tests to compute capability 10.0 only
tests/python/direct/test_with_id_model_indexer.py Simplified skip condition to only use microarchitecture_is(10, 0) instead of combining is_pre_blackwell() and microarchitecture_is_pre(12)
tests/python/direct/test_narrow_precision.py Mixed approach: some tests use is_blackwell() (supports 10.0, 10.3, 12.0, 12.1), while others use microarchitecture_is(10, 0) (only 10.0), creating inconsistent architecture support

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant Utils as direct_utils/utils.py
    participant Device as CUDA Device
    
    Note over Test,Device: Test Execution Flow
    
    Test->>Utils: Check architecture support
    alt Using microarchitecture_is(10, 0)
        Utils->>Device: Get device properties (major, minor)
        Device-->>Utils: Return (10, 0) or other
        Utils-->>Test: Return True only if exactly 10.0
        Note over Test: Skip if not 10.0<br/>(test_cutlass_nvfp4_gemm.py,<br/>test_with_id_model_indexer.py,<br/>some tests in test_narrow_precision.py)
    else Using is_blackwell()
        Utils->>Device: Check multiple compute capabilities
        Device-->>Utils: Return compute capability
        Utils-->>Test: Return True if 10.0, 10.3, 12.0, or 12.1
        Note over Test: Skip if not any Blackwell variant<br/>(some tests in test_narrow_precision.py)
    end
    
    alt Architecture supported
        Test->>Test: Execute test (FP4 quantization, GEMM, etc.)
    else Architecture not supported
        Test->>Test: Skip test with reason
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@pytest.mark.skipif(
is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
)
@pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Inconsistent architecture filtering: this test uses is_blackwell() (supports 10.0, 10.3, 12.0, 12.1) while other tests in this same file use microarchitecture_is(10, 0) (only supports 10.0). This creates different test coverage across Blackwell variants.

@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Review updated until commit 55b5572

Description

  • Updated architecture checking logic from "skip if pre-blackwell" to "skip if not blackwell"

  • Added new is_blackwell() utility function supporting multiple Blackwell architectures (10.0, 10.3, 12.0, 12.1)

  • Modified test skip conditions across multiple test files to properly handle unsupported architectures

  • Replaced module-level skips with function-level @pytest.mark.skipif decorators for better test granularity

Changes walkthrough

Relevant files
Tests
test_cutlass_nvfp4_gemm.py
Update architecture checks for NVFP4 GEMM tests                   

tests/python/direct/test_cutlass_nvfp4_gemm.py

  • Removed module-level compute capability check
  • Added microarchitecture_is import
  • Added function-level skip decorators using microarchitecture_is(10, 0)
  • Updated skip reason to clarify Blackwell 12.0 support limitations
  • +13/-8   
    test_narrow_precision.py
    Update architecture validation for narrow precision tests

    tests/python/direct/test_narrow_precision.py

  • Replaced is_pre_blackwell and microarchitecture_is_pre imports with
    is_blackwell and microarchitecture_is
  • Updated multiple test skip conditions from "skip if pre-blackwell" to
    "skip if not blackwell"
  • Modified narrow precision test architecture validation logic
  • +13/-27 
    test_with_id_model_indexer.py
    Update architecture checks for model indexer test               

    tests/python/direct/test_with_id_model_indexer.py

  • Updated imports to use microarchitecture_is instead of
    is_pre_blackwell and microarchitecture_is_pre
  • Changed skip condition from pre-blackwell check to
    microarchitecture_is(10, 0)
  • Maintained existing test configuration and parameters
  • +3/-6     
    Enhancement
    utils.py
    Add Blackwell architecture detection utility                         

    tests/python/direct_utils/utils.py

  • Added new is_blackwell() function supporting multiple Blackwell
    architectures
  • Function checks for compute capabilities 10.0, 10.3, 12.0, and 12.1
  • Provides comprehensive Blackwell architecture detection
  • +13/-0   

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Architecture Check Logic

    The PR replaces the old compute capability check with a new microarchitecture_is(10, 0) check. Verify that this change correctly identifies the intended architectures and that the skip reason "Does not support blackwell compute 12.0, other arches are not tested" accurately reflects the test's requirements.

    @pytest.mark.skipif(
        not microarchitecture_is(10, 0),
        reason="Does not support blackwell compute 12.0, other arches are not tested.",
    )
    Blackwell-specific Tests

    Tests that were previously skipped on "pre-Blackwell" architectures are now being skipped on non-Blackwell architectures using not is_blackwell(). Ensure this change doesn't inadvertently enable tests on unsupported hardware or disable tests on supported hardware.

    @pytest.mark.skipif(not is_blackwell(), reason="Only supported on blackwell.")
    @pytest.mark.parametrize("dtype", [torch.float32, torch.bfloat16])
    New is_blackwell Function

    A new is_blackwell() function is added that checks for multiple Blackwell architectures (10.0, 10.3, 12.0, 12.1). Verify that this comprehensive check covers all intended Blackwell architectures and doesn't include unintended ones.

    def is_blackwell():
        return (
            microarchitecture_is(10, 0)
            or microarchitecture_is(10, 3)
            or microarchitecture_is(12, 0)
            or microarchitecture_is(12, 1)
        )

    Test failures

    • (Low, 1) Minor numerical mismatch in Thunder vs Torch instance_norm cuda tests

      Test Name A100 Source
      thunder.tests.test_ops.test_core_vs_torch_consistency_instance_norm_nvfuser_cuda_thunder.dtypes.float32

    @naoyam
    Copy link
    Collaborator

    naoyam commented Jan 5, 2026

    The title suggests this has something to do with indexing. Can you clarify what it is?

    @liqiangxl liqiangxl changed the title fix test with id model indexer skip unsupported arches in test test_with_id_model_indexer.py Jan 6, 2026
    @liqiangxl
    Copy link
    Collaborator Author

    The title suggests this has something to do with indexing. Can you clarify what it is?

    The title was not accurate, failure is not realated to indexing. Revised to skip unsupported arches in test test_with_id_model_indexer.py``

    @liqiangxl liqiangxl marked this pull request as draft January 6, 2026 14:30
    @liqiangxl
    Copy link
    Collaborator Author

    @greptileai review this draft

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    4 files reviewed, 4 comments

    Edit Code Review Agent Settings | Greptile

    Comment on lines 239 to 242
    @pytest.mark.skipif(
    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
    )
    @pytest.mark.skipif(
    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
    not microarchitecture_is(10, 0),
    reason="Does not support blackwell compute 12.0, other arches are not tested.",
    )
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    style: Inconsistent architecture filtering within same file: this test uses microarchitecture_is(10, 0) (only supports 10.0), but other tests in this file (lines 66, 173, 320) use is_blackwell() which supports all Blackwell variants (10.0, 10.3, 12.0, 12.1). Are the _scaled_mm tests specifically incompatible with 12.x architectures while quantization tests support them? Do scaled_mm operations have specific incompatibilities with 12.x compute capabilities that quantization operations don't have?

    Comment on lines 448 to 451
    @pytest.mark.skipif(
    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
    )
    @pytest.mark.skipif(
    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
    not microarchitecture_is(10, 0),
    reason="Does not support blackwell compute 12.0, other arches are not tested.",
    )
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    style: Same inconsistency: microarchitecture_is(10, 0) here vs is_blackwell() at line 320 for similar grouped matmul tests

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Comment on lines 608 to 611
    @pytest.mark.skipif(
    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
    )
    @pytest.mark.skipif(
    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
    not microarchitecture_is(10, 0),
    reason="Does not support blackwell compute 12.0, other arches are not tested.",
    )
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    style: Same pattern: microarchitecture_is(10, 0) here vs is_blackwell() elsewhere in file

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Comment on lines 651 to 654
    @pytest.mark.skipif(
    is_pre_blackwell(), reason="Only supported on blackwell and newer devices."
    )
    @pytest.mark.skipif(
    not microarchitecture_is_pre(12), reason="Does not support blackwell compute 12.0"
    not microarchitecture_is(10, 0),
    reason="Does not support blackwell compute 12.0, other arches are not tested.",
    )
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    style: Same pattern: microarchitecture_is(10, 0) here vs is_blackwell() elsewhere in file

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    @liqiangxl liqiangxl changed the title skip unsupported arches in test test_with_id_model_indexer.py skip unsupported arches in python tests Jan 6, 2026
    @liqiangxl
    Copy link
    Collaborator Author

    !test

    @liqiangxl
    Copy link
    Collaborator Author

    Summary of failure reasons:
    The first four tests are failed in the process of computing the reference output using PyTorch. The last test is due to nvFuser.

    1. Test: pytest tests/python/direct/test_cutlass_gemm.py -k "bfloat16-tokens_per_expert_neg_one and 115 and 1024"
      Err: Attempting to use TMEM allocation PTX without CUTE_ARCH_TCGEN05_TMEM_ENABLED
      Reason: caused by torch, torch._grouped_mm

    2. Test: pytest tests/python/direct/test_cutlass_nvfp4_gemm.py -k "gemm[shape and 256 and 64 and bfloat16"
      Err: Trying to use TMA Descriptor Prefetch without CUTE_ARCH_TMA_SM90_ENABLED
      Reason: caused by torch

    3. Test pytest tests/python/direct/test_cutlass_nvfp4_gemm.py -k "grouped_mm and bfloat"
      Err: Attempting to use TMEM allocation PTX without CUTE_ARCH_TCGEN05_TMEM_ENABLED
      Reason: caused by torch

    4. Test pytest tests/python/direct/test_narrow_precision.py -k "test_nv_quantization_to_mxfp8 and float16 and eager"
      Err: CUDA Error: the provided PTX was compiled with an unsupported toolchain
      Reason: TransformerEngine error

    5. Test NVFUSER_DISABLE=parallel_compile pytest tests/python/direct/test_with_id_model_indexer.py -k "eager"
      Err: Attempting to use TMEM allocation PTX without CUTE_ARCH_TCGEN05_TMEM_ENABLED and INTERNAL ASSERT FAILED at /opt/pytorch/nvfuser/csrc/runtime/executor_utils.cpp:559
      Reason: seems nvFuser related.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants