[PyTorch] Fix get_swa_mask() for padding masks #1281

cyanguwa · 2024-10-21T20:33:15Z

Description

This PR fixes the mask generation for sliding window in UnfusedDotProductAttention. It fixes the logic for padding and arbitrary masks in get_swa_mask(), adds more docstring, refactors the call site, and adds more testing in the unit tests.

Fixes #1271

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

Improve the logic in get_swa_mask() and its call site

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2024-10-21T21:57:06Z

/te-ci pytorch

Marks101 · 2024-10-22T07:07:28Z

Hi @cyanguwa,
great, I like the idea to have all the masking logic at one place 👍
I just tested this and found a problem with cross attention:

        if "padding" in attn_mask_type:
            if max_seqlen_q == max_seqlen_kv:
                attention_mask = torch.logical_or(
>                   attention_mask.squeeze(1).unsqueeze(3), attention_mask
                )
E               AttributeError: 'tuple' object has no attribute 'squeeze'

The code in UnfusedDotProductAttention made these lines dependent on the attention_type.

cyanguwa · 2024-10-29T21:43:56Z

Hi @cyanguwa, great, I like the idea to have all the masking logic at one place 👍 I just tested this and found a problem with cross attention:

        if "padding" in attn_mask_type:
            if max_seqlen_q == max_seqlen_kv:
                attention_mask = torch.logical_or(
>                   attention_mask.squeeze(1).unsqueeze(3), attention_mask
                )
E               AttributeError: 'tuple' object has no attribute 'squeeze'

The code in UnfusedDotProductAttention made these lines dependent on the attention_type.

Yes, I think I should use if attention_type == "self" here because there could be cross-attention cases where max_seqlen_q == max_seqlen_kv and actual_seqlen_q != actual_seqlen_kv. I'll go through attention.py and see if there're other places I should use attention_type instead.

Let me know if you observe any other issues too! :) Thanks!

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa · 2024-12-14T09:27:48Z

/te-ci pytorch L0

xrennvidia

LGTM. Thanks.

cyanguwa and others added 5 commits October 18, 2024 17:56

WIP: fix get_swa_mask for padding

e36273a

Signed-off-by: Charlene Yang <[email protected]>

fix mask type setting

4b19996

Signed-off-by: Charlene Yang <[email protected]>

fix the order of checking valid swa and changing mask type

7f08d47

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'NVIDIA:main' into fix_swa_mask

3dfe1fe

[pre-commit.ci] auto fixes from pre-commit.com hooks

3a22f93

for more information, see https://pre-commit.ci

cyanguwa mentioned this pull request Oct 21, 2024

[PyTorch] Use or instead of and to combine swa mask with existing mask #1271

Closed

13 tasks

cyanguwa added 2 commits October 21, 2024 14:54

fix lint

5f5c5c3

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into fix_swa_mask

afe721b

cyanguwa and others added 4 commits December 13, 2024 16:25

Merge branch 'main' into fix_swa_mask

3ed8322

revamp to get full mask

38460c3

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c5436a1

for more information, see https://pre-commit.ci

Merge branch 'main' into fix_swa_mask

58acde9

cyanguwa added the 1.14.0 label Dec 14, 2024

cyanguwa requested a review from xrennvidia December 14, 2024 10:15

cyanguwa mentioned this pull request Dec 17, 2024

[common/PyTorch] Add cuDNN SWA (left, 0) + padding + bottom right causal #1378

Merged

13 tasks

xrennvidia approved these changes Dec 17, 2024

View reviewed changes

cyanguwa merged commit f033498 into NVIDIA:main Dec 18, 2024
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Fix get_swa_mask() for padding masks #1281

[PyTorch] Fix get_swa_mask() for padding masks #1281

cyanguwa commented Oct 21, 2024

cyanguwa commented Oct 21, 2024

Marks101 commented Oct 22, 2024

cyanguwa commented Oct 29, 2024

cyanguwa commented Dec 14, 2024

xrennvidia left a comment

[PyTorch] Fix get_swa_mask() for padding masks #1281

[PyTorch] Fix get_swa_mask() for padding masks #1281

Conversation

cyanguwa commented Oct 21, 2024

Description

Type of change

Changes

Checklist:

cyanguwa commented Oct 21, 2024

Marks101 commented Oct 22, 2024

cyanguwa commented Oct 29, 2024

cyanguwa commented Dec 14, 2024

xrennvidia left a comment

Choose a reason for hiding this comment