[WIP][Feature]Support using flash-attention and flash-varlen-attention on Ascend NPU #36617

FightingZhen · 2025-03-09T06:14:48Z

What does this PR do?

Package flash-attn is not supported on Ascend NPU, even can not be installed. In that case, we can not use flash-attention or flash-varlen-attention with transformers naturally.

Additionally, through Ascend FlashAttentionScore Document we find that Ascend torch_npu package has provided flash-attention and flash-varlen-attention apis on Ascend NPU, which can play the same role as the apis in package flash-attn.

Therefore, this PR is for supporting using flash-attention and flash-varlen-attention on Ascend NPU. All modifications are controlled in using Ascend NPU.

Modifications

Create a new file src/transformers/utils/npu_flash_attention_utils.py to organize necessary functions for using flash-attention and flash-varlen-attention on Ascend NPU, part of codes are copied from flash-attn/bert_padding.py.
Allow is_flash_attn_2_available, is_flash_attn_greater_or_equal_2_10 and is_flash_attn_greater_or_equal 3 functions return True when detecting torch and torch_npu available before detecting whether package flash-attn exists or not.
Patch index_first_axis, pad_input, unpad_input, flash_attn_func, flash_attn_varlen_func 5 functions online to replace same functions in package flash-attn when detecting torch_npu available.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…Ascend NPU

github-actions · 2025-03-09T06:15:02Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

[Feature]Support using flash-attention and flash-varlen-attention on …

046b631

…Ascend NPU

github-actions bot marked this pull request as draft March 9, 2025 06:15

FightingZhen mentioned this pull request Mar 9, 2025

Can not use flash-attention and flash-varlen-attention on Ascend NPU #36618

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Feature]Support using flash-attention and flash-varlen-attention on Ascend NPU #36617

[WIP][Feature]Support using flash-attention and flash-varlen-attention on Ascend NPU #36617

FightingZhen commented Mar 9, 2025 •

edited

Loading

github-actions bot commented Mar 9, 2025

[WIP][Feature]Support using flash-attention and flash-varlen-attention on Ascend NPU #36617

Are you sure you want to change the base?

[WIP][Feature]Support using flash-attention and flash-varlen-attention on Ascend NPU #36617

Conversation

FightingZhen commented Mar 9, 2025 • edited Loading

What does this PR do?

Modifications

Before submitting

Who can review?

github-actions bot commented Mar 9, 2025

FightingZhen commented Mar 9, 2025 •

edited

Loading