[PyTorch] Use or instead of and to combine swa mask with existing mask #1271

Marks101 · 2024-10-18T11:37:39Z

Description

Hello team,

when creating the sliding window attention mask, the existing mask is currently combined with the SWA mask using a logical and. I think at this point an or should be used, because True means masked out in this part of the code and attention scores should be masked out if they are either masked out in the existing mask or in the SWA mask.

Please have a look and check if this makes sense 😄

Markus

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

change logical and to or in creation of sliding window attention mask

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Markus Schnoes <[email protected]>

ksivaman · 2024-10-18T14:43:43Z

I think what's happening here is that the actual SWA mask is already created here before using the user provided attention_mask, which is then used as an additional filter using the logical_and and so the existing implementation would make sense. @cyanguwa Could you confirm?

Marks101 · 2024-10-21T06:16:54Z

Hi @cyanguwa and @ksivaman,

sorry, I think my description was a little convoluted. I will explain my point based on an example.
For a padding mask as input, the following would be my expectation for get_swa_mask():

>>> import torch
>>> from transformer_engine.pytorch.attention import get_swa_mask
>>> attention_mask = torch.tensor([0, 0, 0, 0, 0, 1], dtype=bool, device="cuda")
>>> get_swa_mask((1, -1), 6, 6, "padding", attention_mask)[0]
tensor([[0, 1, 1, 1, 1, 1],
        [0, 0, 1, 1, 1, 1],
        [1, 0, 0, 1, 1, 1],
        [1, 1, 0, 0, 1, 1],
        [1, 1, 1, 0, 0, 1],
        [1, 1, 1, 1, 0, 1]], device='cuda:0')

But instead, this is currently the output:

>>> get_swa_mask((1, -1), 6, 6, "padding", attention_mask)[0]
tensor([[0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0]], device='cuda:0')

No sliding window at all 🤷

cyanguwa · 2024-10-21T20:38:16Z

Hi Markus!

Yes, there's some issue with the logic of get_swa_mask(). I'm fixing it here: #1281. I changed a bit more than just logical_and() to logical_or(), and would like to have get_swa_mask() handle any combination of attn_mask_type and window_size. Could you take a look at that PR and let me know if there's any comments/testing issues please? Thanks!

cyanguwa · 2024-10-21T20:39:49Z

Let's handle this issue in that PR. I'll close this one for now. Thanks for always bringing up important issues and changes!

[PyTorch] Use or instead of and to combine swa mask with existing mask

272fcf2

Signed-off-by: Markus Schnoes <[email protected]>

cyanguwa self-assigned this Oct 19, 2024

cyanguwa mentioned this pull request Oct 21, 2024

[PyTorch] Fix get_swa_mask() for padding masks #1281

Merged

13 tasks

cyanguwa closed this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Use or instead of and to combine swa mask with existing mask #1271

[PyTorch] Use or instead of and to combine swa mask with existing mask #1271

Marks101 commented Oct 18, 2024

ksivaman commented Oct 18, 2024

Marks101 commented Oct 21, 2024

cyanguwa commented Oct 21, 2024 •

edited

Loading

cyanguwa commented Oct 21, 2024

[PyTorch] Use or instead of and to combine swa mask with existing mask #1271

[PyTorch] Use or instead of and to combine swa mask with existing mask #1271

Conversation

Marks101 commented Oct 18, 2024

Description

Type of change

Changes

Checklist:

ksivaman commented Oct 18, 2024

Marks101 commented Oct 21, 2024

cyanguwa commented Oct 21, 2024 • edited Loading

cyanguwa commented Oct 21, 2024

cyanguwa commented Oct 21, 2024 •

edited

Loading