-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PyTorch] Add sliding window support to FlashAttention #551
Conversation
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci |
Signed-off-by: Charlene Yang <[email protected]>
/te-ci |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci pytorch |
Signed-off-by: Charlene Yang <[email protected]>
49e147e
to
32db392
Compare
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci pytorch |
1 similar comment
/te-ci pytorch |
Pipeline 11331817 |
With newer cuDNN, pipeline 11412889 is green! |
Signed-off-by: cyanguwa <[email protected]>
/te-ci pytorch |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci pytorch |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci pytorch |
Signed-off-by: Charlene Yang <[email protected]>
/te-ci pytorch |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Also want to add a comment that what happens when we want to use grouped query attention with unfused attention. Right now it seems like it errors out. |
This PR only makes changes on the PyTorch side. It
TransformerLayer
,MultiHeadAttention
,DotProductAttention
andFlashAttention
UnfusedDotProductAttention
arbitrary mask, generated based on the window sizeuse_unfused_attention
flag and exception when none of the three DPA backends are available