Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Summary

  • Updates documentation and references to reflect the new Flash Sparse Attention branding.

Root Cause

  • The project underwent a rebranding from Flash Dynamic Mask Attention to Flash Sparse Attention, necessitating updates across documentation and code references.

Changes

  • Renamed variables, documentation, and references throughout the codebase to align with the new branding.

Reproduction

  • Review the updated documentation and references in the codebase.

Tests

  • No new tests added; existing tests validated against the updated references.

Compatibility

  • No backward compatibility issues; all references have been updated consistently.

Checklist

  • Linked issue provided
  • Adds or updates tests
  • Updates docs if needed
  • No perf regressions

Exposes backend availability flags to let callers probe supported runtimes without import errors.
Provides auto-selection helper to fall back to the first available backend for attention execution.
Introduces a Flex Attention forward path that constructs causal block masks, normalizes mask and bias defaults, and applies compile-friendly kernel options to ease sparse Flash workloads.
- Deleted `modeling_flash_dynamic_mask_attention_utils.py` as it contained redundant code and was not being utilized.
- Removed `mask.py` and `padding.py` files which were not necessary for the current implementation, streamlining the codebase.
Adds fused forward/backward kernels in Triton to accelerate sparse attention with masking, bias, and GQA support for PyTorch integration.
Enables calling sparse Flash attention CUDA kernels through custom autograd helpers.
Registers fake implementations and padding logic so torch.compile stays compatible with varying head shapes.
Updates package and repo naming so installation commands match the published distribution.
Repositions performance benchmarks after usage guidance for both languages and aligns tensor examples to current API expectations.
Aligns packaging metadata with new repository identity.
Clarifies security instructions under the Flash Sparse Attention brand so users follow the right guidance for install, reporting, and support
Aligns packaging metadata and build hooks with the flash_sparse_attn name so prebuilt wheels, env vars, and CUDA builds resolve correctly.
Points contribution guide links at flash-sparse-attention to avoid outdated references.
Reflects updated project title and repository location to keep citation metadata current.
Introduces cached availability checks so integrations can detect flash sparse attention without importing local modules and ensures CUDA backed torch is present before enabling features.
Supports future HF integration by routing calls through flash sparse attention logic and normalizing autocast, causal, and dtype handling
Introduces lazy import plumbing for flash sparse attention kernels to streamline future integrations.
Prepares padding-aware helpers and kwarg validation so padding-free flows and PEFT casting stay compatible with the kernels.
Introduces mask utilities for top-k and relu masking to support flash sparse attention.
Enables optional block smoothing to stabilize dynamic sparsity patterns.
Introduces reusable padding helpers to consolidate ragged tensor handling and avoid recomputing per layer indices. Addresses static-cache overflow by slicing KV states and provides local indexing to keep graph-friendly.
Points the integration to the renamed sparse attention package so setup guidance stays accurate.
…integration and v1.0.0 technical report. These files have been superseded by updated documentation reflecting recent changes and improvements in the codebase.
Updates API reference to reflect the flash_sparse_attn branding so installation instructions, imports, and backend descriptions stay consistent with the renamed package.
Updates terminology to reflect the flash sparse attention rebranding so readers follow accurate package names, imports, and integration guidance.
Updates benchmark integrations to load the flash_sparse_attn implementations so the renamed package continues to back the CUDA, Triton, and Flex runs.
Renames the availability guards and status messages to keep diagnostic output aligned with the new module namespace.
Updates the sparse attention backend to drop the old dynamic mask name so future errors and docs consistently refer to FlashSparseAttention.
Maintains naming consistency after the FSA rebrand.
Copilot AI review requested due to automatic review settings November 9, 2025 15:52
@github-actions github-actions bot requested a review from Evanwu1125 November 9, 2025 15:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates all documentation, error messages, and references throughout the codebase to reflect the rebranding from "Flash Dynamic Mask Attention" to "Flash Sparse Attention". The changes maintain consistency across code, documentation, and configuration files.

  • Updated function/variable naming from flash_dmattn to flash_sparse_attn
  • Renamed references in error messages and documentation
  • Updated repository URLs and package names

Reviewed Changes

Copilot reviewed 19 out of 337 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
csrc/flash_sparse_attn/src/generate_kernels.py Updated kernel generation description
csrc/flash_sparse_attn/src/flash_fwd_launch_template.h Updated error message for unsupported architecture
csrc/flash_sparse_attn/src/flash_bwd_launch_template.h Updated error message for unsupported architecture
csrc/flash_sparse_attn/flash_api.cpp Updated error messages and module documentation
benchmarks/forward_performance.py Updated import statements and variable names
benchmarks/forward_equivalence.py Updated import statements and variable names
benchmarks/backward_performance.py Updated import statements and variable names
benchmarks/backward_equivalence.py Updated import statements and variable names
SECURITY.md Updated project name references and URLs
README_zh.md Updated installation instructions and reorganized content
README.md Updated installation instructions and reorganized content
CONTRIBUTING.md Updated repository URLs and references
CITATION.cff Updated title and repository URL
.github/workflows/manual_publish.yml Updated environment variable name
.github/workflows/_build.yml Updated environment variable names
.github/ISSUE_TEMPLATE/feature_request.yml Updated description abbreviation
.github/ISSUE_TEMPLATE/feature_request.md Updated description and context
.github/ISSUE_TEMPLATE/bug_report.yml Updated description abbreviation
.github/ISSUE_TEMPLATE/bug_report.md Updated description and version label

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# 为稀疏注意力创建 bias
attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
attn_bias = torch.randn(batch_size, num_kv_heads, 1, seq_len, device=device, dtype=dtype)
Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attention bias shape has been changed from (batch_size, num_kv_heads, seq_len, seq_len) to (batch_size, num_kv_heads, 1, seq_len), but this dimension change is not documented anywhere in the PR. This appears to be a functional code change rather than just a documentation/naming update, which contradicts the stated purpose of this PR. This change could break existing code that depends on the original shape and needs proper documentation and migration guidance.

Suggested change
attn_bias = torch.randn(batch_size, num_kv_heads, 1, seq_len, device=device, dtype=dtype)
attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems requested a review from Copilot November 9, 2025 15:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 19 out of 337 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@LoserCheems LoserCheems merged commit b746952 into main Nov 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants