Fix documentation and references for Flash Sparse Attention #207

LoserCheems · 2025-11-09T15:52:01Z

Summary

Updates documentation and references to reflect the new Flash Sparse Attention branding.

Root Cause

The project underwent a rebranding from Flash Dynamic Mask Attention to Flash Sparse Attention, necessitating updates across documentation and code references.

Changes

Renamed variables, documentation, and references throughout the codebase to align with the new branding.

Reproduction

Review the updated documentation and references in the codebase.

Tests

No new tests added; existing tests validated against the updated references.

Compatibility

No backward compatibility issues; all references have been updated consistently.

Checklist

Linked issue provided
Adds or updates tests
Updates docs if needed
No perf regressions

Exposes backend availability flags to let callers probe supported runtimes without import errors. Provides auto-selection helper to fall back to the first available backend for attention execution.

…tion name

Introduces a Flex Attention forward path that constructs causal block masks, normalizes mask and bias defaults, and applies compile-friendly kernel options to ease sparse Flash workloads.

- Deleted `modeling_flash_dynamic_mask_attention_utils.py` as it contained redundant code and was not being utilized. - Removed `mask.py` and `padding.py` files which were not necessary for the current implementation, streamlining the codebase.

Adds fused forward/backward kernels in Triton to accelerate sparse attention with masking, bias, and GQA support for PyTorch integration.

Enables calling sparse Flash attention CUDA kernels through custom autograd helpers. Registers fake implementations and padding logic so torch.compile stays compatible with varying head shapes.

Updates package and repo naming so installation commands match the published distribution. Repositions performance benchmarks after usage guidance for both languages and aligns tensor examples to current API expectations.

Aligns packaging metadata with new repository identity.

Clarifies security instructions under the Flash Sparse Attention brand so users follow the right guidance for install, reporting, and support

Aligns packaging metadata and build hooks with the flash_sparse_attn name so prebuilt wheels, env vars, and CUDA builds resolve correctly.

Points contribution guide links at flash-sparse-attention to avoid outdated references.

Reflects updated project title and repository location to keep citation metadata current.

Introduces cached availability checks so integrations can detect flash sparse attention without importing local modules and ensures CUDA backed torch is present before enabling features.

Supports future HF integration by routing calls through flash sparse attention logic and normalizing autocast, causal, and dtype handling

Introduces lazy import plumbing for flash sparse attention kernels to streamline future integrations. Prepares padding-aware helpers and kwarg validation so padding-free flows and PEFT casting stay compatible with the kernels.

Introduces mask utilities for top-k and relu masking to support flash sparse attention. Enables optional block smoothing to stabilize dynamic sparsity patterns.

Introduces reusable padding helpers to consolidate ragged tensor handling and avoid recomputing per layer indices. Addresses static-cache overflow by slicing KV states and provides local indexing to keep graph-friendly.

Points the integration to the renamed sparse attention package so setup guidance stays accurate.

…integration and v1.0.0 technical report. These files have been superseded by updated documentation reflecting recent changes and improvements in the codebase.

Updates API reference to reflect the flash_sparse_attn branding so installation instructions, imports, and backend descriptions stay consistent with the renamed package.

Updates terminology to reflect the flash sparse attention rebranding so readers follow accurate package names, imports, and integration guidance.

Updates benchmark integrations to load the flash_sparse_attn implementations so the renamed package continues to back the CUDA, Triton, and Flex runs. Renames the availability guards and status messages to keep diagnostic output aligned with the new module namespace.

Updates the sparse attention backend to drop the old dynamic mask name so future errors and docs consistently refer to FlashSparseAttention.

Maintains naming consistency after the FSA rebrand.

…tention context

Copilot

Pull Request Overview

This PR updates all documentation, error messages, and references throughout the codebase to reflect the rebranding from "Flash Dynamic Mask Attention" to "Flash Sparse Attention". The changes maintain consistency across code, documentation, and configuration files.

Updated function/variable naming from flash_dmattn to flash_sparse_attn
Renamed references in error messages and documentation
Updated repository URLs and package names

Reviewed Changes

Copilot reviewed 19 out of 337 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
csrc/flash_sparse_attn/src/generate_kernels.py	Updated kernel generation description
csrc/flash_sparse_attn/src/flash_fwd_launch_template.h	Updated error message for unsupported architecture
csrc/flash_sparse_attn/src/flash_bwd_launch_template.h	Updated error message for unsupported architecture
csrc/flash_sparse_attn/flash_api.cpp	Updated error messages and module documentation
benchmarks/forward_performance.py	Updated import statements and variable names
benchmarks/forward_equivalence.py	Updated import statements and variable names
benchmarks/backward_performance.py	Updated import statements and variable names
benchmarks/backward_equivalence.py	Updated import statements and variable names
SECURITY.md	Updated project name references and URLs
README_zh.md	Updated installation instructions and reorganized content
README.md	Updated installation instructions and reorganized content
CONTRIBUTING.md	Updated repository URLs and references
CITATION.cff	Updated title and repository URL
.github/workflows/manual_publish.yml	Updated environment variable name
.github/workflows/_build.yml	Updated environment variable names
.github/ISSUE_TEMPLATE/feature_request.yml	Updated description abbreviation
.github/ISSUE_TEMPLATE/feature_request.md	Updated description and context
.github/ISSUE_TEMPLATE/bug_report.yml	Updated description abbreviation
.github/ISSUE_TEMPLATE/bug_report.md	Updated description and version label

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-09T15:52:43Z

README_zh.md


 # 为稀疏注意力创建 bias
-attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
+attn_bias = torch.randn(batch_size, num_kv_heads, 1, seq_len, device=device, dtype=dtype)


The attention bias shape has been changed from (batch_size, num_kv_heads, seq_len, seq_len) to (batch_size, num_kv_heads, 1, seq_len), but this dimension change is not documented anywhere in the PR. This appears to be a functional code change rather than just a documentation/naming update, which contradicts the stated purpose of this PR. This change could break existing code that depends on the original shape and needs proper documentation and migration guidance.

Suggested change

attn_bias = torch.randn(batch_size, num_kv_heads, 1, seq_len, device=device, dtype=dtype)

attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)

CONTRIBUTING.md

Co-authored-by: Copilot <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 19 out of 337 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LoserCheems added 28 commits November 9, 2025 22:57

Adds backend auto-selection API

6900b29

Exposes backend availability flags to let callers probe supported runtimes without import errors. Provides auto-selection helper to fall back to the first available backend for attention execution.

Fix docstring for flash_sparse_attn_func_auto to reflect correct func…

71add00

…tion name

Adds Flex flash-sparse attention hook

e02668c

Introduces a Flex Attention forward path that constructs causal block masks, normalizes mask and bias defaults, and applies compile-friendly kernel options to ease sparse Flash workloads.

Implement feature X to enhance user experience and optimize performance

ea4350a

Introduces Triton sparse attention kernels

6bf01c4

Adds fused forward/backward kernels in Triton to accelerate sparse attention with masking, bias, and GQA support for PyTorch integration.

Adds flash sparse attention interface

152c73a

Enables calling sparse Flash attention CUDA kernels through custom autograd helpers. Registers fake implementations and padding logic so torch.compile stays compatible with varying head shapes.

Clarifies install docs and performance layout

77e4e61

Updates package and repo naming so installation commands match the published distribution. Repositions performance benchmarks after usage guidance for both languages and aligns tensor examples to current API expectations.

Renames project to flash-sparse-attn

6a29931

Aligns packaging metadata with new repository identity.

Aligns security docs with FSA naming

612b85c

Clarifies security instructions under the Flash Sparse Attention brand so users follow the right guidance for install, reporting, and support

Renames package to flash_sparse_attn

9f5d48d

Aligns packaging metadata and build hooks with the flash_sparse_attn name so prebuilt wheels, env vars, and CUDA builds resolve correctly.

Aligns repo links with new name

13a0db0

Points contribution guide links at flash-sparse-attention to avoid outdated references.

Aligns citation with repo rename

307a50e

Reflects updated project title and repository location to keep citation metadata current.

Adds import helpers for sparse attention

186c725

Introduces cached availability checks so integrations can detect flash sparse attention without importing local modules and ensures CUDA backed torch is present before enabling features.

Adds flash sparse attention wrapper

0402b39

Supports future HF integration by routing calls through flash sparse attention logic and normalizing autocast, causal, and dtype handling

Adds flash sparse attention utils

6bb896f

Introduces lazy import plumbing for flash sparse attention kernels to streamline future integrations. Prepares padding-aware helpers and kwarg validation so padding-free flows and PEFT casting stay compatible with the kernels.

Adds dynamic mask helpers

11a0862

Introduces mask utilities for top-k and relu masking to support flash sparse attention. Enables optional block smoothing to stabilize dynamic sparsity patterns.

Adds shared unpadding utilities

3dd3392

Introduces reusable padding helpers to consolidate ragged tensor handling and avoid recomputing per layer indices. Addresses static-cache overflow by slicing KV states and provides local indexing to keep graph-friendly.

Updates flash attention integration

df69839

Points the integration to the renamed sparse attention package so setup guidance stays accurate.

Remove outdated documentation files for Flash Dynamic Mask Attention …

7e3faab

…integration and v1.0.0 technical report. These files have been superseded by updated documentation reflecting recent changes and improvements in the codebase.

Align docs with sparse attention rename

554e7e0

Updates API reference to reflect the flash_sparse_attn branding so installation instructions, imports, and backend descriptions stay consistent with the renamed package.

Aligns Chinese doc with sparse attention

ac95f25

Updates terminology to reflect the flash sparse attention rebranding so readers follow accurate package names, imports, and integration guidance.

Renames flash attention variant

b3ac56f

Updates the sparse attention backend to drop the old dynamic mask name so future errors and docs consistently refer to FlashSparseAttention.

Aligns issue templates with FSA

92c0fad

Maintains naming consistency after the FSA rebrand.

Renames environment variables for sparse attention build configuration

4aa0153

Renames the kernel generation script description to reflect sparse at…

fc64149

…tention context

Renames environment variable for sparse attention build configuration

1211c5b

Copilot AI review requested due to automatic review settings November 9, 2025 15:52

github-actions bot requested a review from Evanwu1125 November 9, 2025 15:52

github-actions bot requested review from SNHuan, Thanksyy, ftgreat, juliohsu, wubingheng111 and zacliu2023 November 9, 2025 15:52

github-actions bot assigned Evanwu1125, ftgreat, juliohsu, SNHuan, Thanksyy, wubingheng111 and zacliu2023 Nov 9, 2025

Copilot AI reviewed Nov 9, 2025

View reviewed changes

Update CONTRIBUTING.md

8695288

Co-authored-by: Copilot <[email protected]>

LoserCheems requested a review from Copilot November 9, 2025 15:54

Copilot AI reviewed Nov 9, 2025

View reviewed changes

LoserCheems merged commit b746952 into main Nov 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix documentation and references for Flash Sparse Attention #207

Fix documentation and references for Flash Sparse Attention #207

Uh oh!

LoserCheems commented Nov 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

	attn_bias = torch.randn(batch_size, num_kv_heads, 1, seq_len, device=device, dtype=dtype)
	attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)

Fix documentation and references for Flash Sparse Attention #207

Fix documentation and references for Flash Sparse Attention #207

Uh oh!

Conversation

LoserCheems commented Nov 9, 2025

Summary

Root Cause

Changes

Reproduction

Tests

Compatibility

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants