Fix grid dim overflow in DSA backward convert kernel on SM100 by szluyu99 · Pull Request #331 · NVIDIA/cudnn-frontend

szluyu99 · 2026-06-30T07:07:23Z

The convert kernel grid was configured as [1, convert_grid_x, 1], placing the seq-block dimension on grid.y. CUDA caps grid.y/z at 65535, so large mKV.shape[0] / block_seq values trigger invalid configuration argument. grid.x supports up to 2^31-1, so move convert_grid_x to grid.x and update the corresponding block_idx() unpacking in the kernel accordingly. No behavior change for in-range sizes.

Summary by CodeRabbit

Bug Fixes
- Corrected how sparse attention backward work is distributed across compute tiles, improving consistency and correctness during execution.

The convert kernel grid was configured as [1, convert_grid_x, 1], placing the seq-block dimension on grid.y. CUDA caps grid.y/z at 65535, so large mKV.shape[0] / block_seq values trigger `invalid configuration argument`. grid.x supports up to 2^31-1, so move convert_grid_x to grid.x and update the corresponding block_idx() unpacking in the kernel accordingly. No behavior change for in-range sizes.

coderabbitai · 2026-06-30T07:07:42Z

📝 Walkthrough

Walkthrough

In the convert() kernel, the launch grid's first two dimensions are swapped so convert_grid_x occupies the first slot, and the cute.arch.block_idx() unpacking is updated to extract seq_block_idx from the first component instead of the second.

Changes

convert() kernel grid index fix

Layer / File(s)	Summary
Grid layout and block_idx swap `python/cudnn/deepseek_sparse_attention/sparse_attention_backward/dsa_bwd_sm100.py`	`convert_grid` reordered to place `convert_grid_x` first, and `seq_block_idx` now unpacked from the first `block_idx()` component to match.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A grid was flipped, a swap so neat,
First becomes first, the indices meet,
seq_block_idx hops to position one,
The convert kernel fix is done!
🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main fix: moving the SM100 DSA backward convert kernel grid to avoid dimension overflow.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

python/cudnn/deepseek_sparse_attention/sparse_attention_backward/dsa_bwd_sm100.py (1)

573-577: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Document the grid/block index coupling.

This fix depends on convert_grid_x staying in grid.x and seq_block_idx continuing to come from block_idx.x. A short inline note here would help prevent a future cleanup from moving the sequence dimension back into grid.y and reintroducing the overflow.

Suggested diff

         convert_grid_x = (mKV.shape[0] + self.block_seq - 1) // self.block_seq
+        # Keep the sequence-block dimension in grid.x: CUDA grid.y/grid.z are capped at 65535.
+        # convert() must therefore read seq_block_idx from block_idx.x.
         convert_grid = [
             convert_grid_x,
             1,
             1,
         ]
@@
         (
+            # grid.x carries the sequence-block index to match convert_grid above.
             seq_block_idx,
             _,
             batch_idx,
         ) = cute.arch.block_idx()

As per path instructions, "Focus on documentation."

Also applies to: 619-622

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@python/cudnn/deepseek_sparse_attention/sparse_attention_backward/dsa_bwd_sm100.py`
around lines 573 - 577, Add a brief inline comment near the convert_grid/launch
setup in dsa_bwd_sm100.py documenting that convert_grid_x must remain in grid.x
and seq_block_idx must continue to be derived from block_idx.x; this coupling is
relied on by the backward kernel indexing, so avoid any future cleanup that
moves the sequence dimension to grid.y. Apply the same note to the related
launch section referenced in the later block as well, keeping the guidance tied
to the existing convert_grid_x and seq_block_idx symbols.

Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@python/cudnn/deepseek_sparse_attention/sparse_attention_backward/dsa_bwd_sm100.py`:
- Around line 573-577: Add a brief inline comment near the convert_grid/launch
setup in dsa_bwd_sm100.py documenting that convert_grid_x must remain in grid.x
and seq_block_idx must continue to be derived from block_idx.x; this coupling is
relied on by the backward kernel indexing, so avoid any future cleanup that
moves the sequence dimension to grid.y. Apply the same note to the related
launch section referenced in the later block as well, keeping the guidance tied
to the existing convert_grid_x and seq_block_idx symbols.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 53b7a4d0-3b2f-4852-819b-e2c7af70869a

📥 Commits

Reviewing files that changed from the base of the PR and between 8612893 and 55486fb.

📒 Files selected for processing (1)

python/cudnn/deepseek_sparse_attention/sparse_attention_backward/dsa_bwd_sm100.py

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix grid dim overflow in DSA backward convert kernel on SM100#331

Fix grid dim overflow in DSA backward convert kernel on SM100#331
szluyu99 wants to merge 1 commit into
NVIDIA:developfrom
szluyu99:fix/dsa-bwd-convert-grid-overflow

szluyu99 commented Jun 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

szluyu99 commented Jun 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

szluyu99 commented Jun 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading