[PyTorch] Add sliding window support to FlashAttention #551

cyanguwa · 2023-12-06T00:26:45Z

This PR only makes changes on the PyTorch side. It

integrates flash-attn 2.3+ sliding window attention to TransformerLayer, MultiHeadAttention, DotProductAttention and FlashAttention
adds unit tests to compare against UnfusedDotProductAttention arbitrary mask, generated based on the window size
adds a use_unfused_attention flag and exception when none of the three DPA backends are available
adds more determinism control in the backend selection, in particular, the filter that fused attention arbitrary backend is non-deterministic on non-sm90 architectures because it doesn't have a workspace optimization path

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-06T00:27:15Z

/te-ci

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-06T00:35:12Z

/te-ci

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-06T20:26:47Z

/te-ci pytorch

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-08T01:13:48Z

/te-ci pytorch

cyanguwa · 2023-12-08T01:29:38Z

/te-ci pytorch

cyanguwa · 2023-12-08T01:47:52Z

Pipeline 11331817

cyanguwa · 2023-12-11T21:40:49Z

With newer cuDNN, pipeline 11412889 is green!

Signed-off-by: cyanguwa <[email protected]>

cyanguwa · 2023-12-13T23:46:52Z

/te-ci pytorch

tests/pytorch/test_fused_attn.py

tests/pytorch/test_sanity.py

transformer_engine/pytorch/attention.py

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-15T20:32:49Z

/te-ci pytorch

Signed-off-by: Charlene Yang <[email protected]>

transformer_engine/pytorch/attention.py

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-15T22:43:15Z

/te-ci pytorch

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-15T23:14:34Z

/te-ci pytorch

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2023-12-15T23:55:27Z

/te-ci pytorch

ptrendx

LGTM, thanks!

ashvinnihalani · 2024-03-07T04:28:11Z

Also want to add a comment that what happens when we want to use grouped query attention with unfused attention. Right now it seems like it errors out.

cyanguwa added 2 commits December 6, 2023 00:07

add sliding window to FA

69f568e

Signed-off-by: Charlene Yang <[email protected]>

fix forward logic

8f7121f

Signed-off-by: Charlene Yang <[email protected]>

fix lint

f37e872

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa added 3 commits December 6, 2023 00:59

change bert test to causal as unfused does not support padding

37f1469

Signed-off-by: Charlene Yang <[email protected]>

fix FlashAttention for v2-2.3 versions

e278579

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into fa/sliding_window

37c1caf

verify FA swa works

49e147e

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa closed this Dec 7, 2023

cyanguwa force-pushed the fa/sliding_window branch from 49e147e to 32db392 Compare December 7, 2023 23:41

merge main

72447a1

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa reopened this Dec 8, 2023

cyanguwa added 2 commits December 8, 2023 00:54

fix mask related restrictions and duplicate code after merge

97f807f

Signed-off-by: Charlene Yang <[email protected]>

fix swa test

603cb52

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa requested review from timmoon10, ksivaman and ptrendx December 11, 2023 22:36

Merge branch 'main' into fa/sliding_window

ce15516

Signed-off-by: cyanguwa <[email protected]>

akoumpa mentioned this pull request Dec 15, 2023

Sliding window attention/akoumparouli NVIDIA/Megatron-LM#633

Merged

ptrendx reviewed Dec 15, 2023

View reviewed changes

tests/pytorch/test_fused_attn.py Show resolved Hide resolved

ptrendx reviewed Dec 15, 2023

View reviewed changes

tests/pytorch/test_fused_attn.py Show resolved Hide resolved

ptrendx reviewed Dec 15, 2023

View reviewed changes

tests/pytorch/test_sanity.py Outdated Show resolved Hide resolved

ptrendx reviewed Dec 15, 2023

View reviewed changes

transformer_engine/pytorch/attention.py Outdated Show resolved Hide resolved

cyanguwa added 4 commits December 15, 2023 11:51

add docstring for get_swa func

e2fcf1d

Signed-off-by: Charlene Yang <[email protected]>

move repeated code into a function

0f6b235

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into fa/sliding_window

f91d80d

revert mask change

67dd2d9

Signed-off-by: Charlene Yang <[email protected]>

add determinism filter and fix FA warning message

576112d

Signed-off-by: Charlene Yang <[email protected]>

ptrendx reviewed Dec 15, 2023

View reviewed changes

transformer_engine/pytorch/attention.py Outdated Show resolved Hide resolved

cyanguwa added 2 commits December 15, 2023 22:37

add message for determinism filter

9c7de54

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into fa/sliding_window

c8afd21

simplify check_set_window_size()

53bf1f5

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa added 2 commits December 15, 2023 15:53

fix check_set_window_size in transformer layers

24252a2

Signed-off-by: Charlene Yang <[email protected]>

fix indent

87c9826

Signed-off-by: Charlene Yang <[email protected]>

ptrendx approved these changes Dec 16, 2023

View reviewed changes

ptrendx merged commit 27aa609 into NVIDIA:main Dec 16, 2023
20 checks passed

Marks101 mentioned this pull request Jan 25, 2024

[PyTorch] FlashAttention: causal masking enforced in cross attention due to sliding window attention #629

Closed

cyanguwa deleted the fa/sliding_window branch February 21, 2024 23:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Add sliding window support to FlashAttention #551

[PyTorch] Add sliding window support to FlashAttention #551

cyanguwa commented Dec 6, 2023 •

edited

Loading

cyanguwa commented Dec 6, 2023

cyanguwa commented Dec 6, 2023

cyanguwa commented Dec 6, 2023

cyanguwa commented Dec 8, 2023

cyanguwa commented Dec 8, 2023

cyanguwa commented Dec 8, 2023

cyanguwa commented Dec 11, 2023 •

edited

Loading

cyanguwa commented Dec 13, 2023

cyanguwa commented Dec 15, 2023

cyanguwa commented Dec 15, 2023

cyanguwa commented Dec 15, 2023

cyanguwa commented Dec 15, 2023

ptrendx left a comment

ashvinnihalani commented Mar 7, 2024

[PyTorch] Add sliding window support to FlashAttention #551

[PyTorch] Add sliding window support to FlashAttention #551

Conversation

cyanguwa commented Dec 6, 2023 • edited Loading

cyanguwa commented Dec 6, 2023

cyanguwa commented Dec 6, 2023

cyanguwa commented Dec 6, 2023

cyanguwa commented Dec 8, 2023

cyanguwa commented Dec 8, 2023

cyanguwa commented Dec 8, 2023

cyanguwa commented Dec 11, 2023 • edited Loading

cyanguwa commented Dec 13, 2023

cyanguwa commented Dec 15, 2023

cyanguwa commented Dec 15, 2023

cyanguwa commented Dec 15, 2023

cyanguwa commented Dec 15, 2023

ptrendx left a comment

Choose a reason for hiding this comment

ashvinnihalani commented Mar 7, 2024

cyanguwa commented Dec 6, 2023 •

edited

Loading

cyanguwa commented Dec 11, 2023 •

edited

Loading