Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Feature]Support using flash-attention and flash-varlen-attention on Ascend NPU #36617

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

FightingZhen
Copy link

@FightingZhen FightingZhen commented Mar 9, 2025

What does this PR do?

Package flash-attn is not supported on Ascend NPU, even can not be installed. In that case, we can not use flash-attention or flash-varlen-attention with transformers naturally.

Additionally, through Ascend FlashAttentionScore Document we find that Ascend torch_npu package has provided flash-attention and flash-varlen-attention apis on Ascend NPU, which can play the same role as the apis in package flash-attn.

Therefore, this PR is for supporting using flash-attention and flash-varlen-attention on Ascend NPU. All modifications are controlled in using Ascend NPU.

Modifications

  1. Create a new file src/transformers/utils/npu_flash_attention_utils.py to organize necessary functions for using flash-attention and flash-varlen-attention on Ascend NPU, part of codes are copied from flash-attn/bert_padding.py.
  2. Allow is_flash_attn_2_available, is_flash_attn_greater_or_equal_2_10 and is_flash_attn_greater_or_equal 3 functions return True when detecting torch and torch_npu available before detecting whether package flash-attn exists or not.
  3. Patch index_first_axis, pad_input, unpad_input, flash_attn_func, flash_attn_varlen_func 5 functions online to replace same functions in package flash-attn when detecting torch_npu available.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link

github-actions bot commented Mar 9, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant