-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate cuDNN frontend v1 to fused attention #497
Conversation
/te-ci |
/te-ci |
1 similar comment
/te-ci |
92d2d56
to
7e3d4fe
Compare
Signed-off-by: Charlene Yang <[email protected]>
c4c7c9d
to
998ccb0
Compare
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci |
Signed-off-by: Charlene Yang <[email protected]>
/te-ci |
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.h
Outdated
Show resolved
Hide resolved
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu
Outdated
Show resolved
Hide resolved
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
a30f49e
to
71e51ea
Compare
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
/te-ci pytorch |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: cyanguwa <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
…rmerEngine into fused_attn/graph_api_v1
/te-ci jax |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Pipeline 11240636 |
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.h
Outdated
Show resolved
Hide resolved
transformer_engine/common/include/transformer_engine/fused_attn.h
Outdated
Show resolved
Hide resolved
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
… test Signed-off-by: Charlene Yang <[email protected]>
Pipeline 11267171 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for making this awesome upgrade!
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Pipeline 11267171 is green. The last 5 commits are mostly to fix L1 tests, which are not critical. 95d0820 is an exception. It reduces the amount of prints when training. |
This PR:
padding
,padding_causal
masks,post_scale_bias
,alibi
biases, and MQA/GQA to fused attention,_qkvpacked
APIs toh3d/3hd
fromqkv_interleaved
, and support for_kvpacked
APIs tohd_h2d/hd_2hd
fromkv_interleaved
,qkv_interleaved
,kv_interleaved
,not_interleaved
enums,FlashAttention
module forpadding_causal
mask,thd
format inFlashAttention
,alibi
support forUnfusedDPA
,max512
toarbitrary_seqlen
backend,FlashAttention
toFusedAttention
arbitrary_seqlen
backend on sm90,padding_causal
mask for cross attention formax512
backend due to bugs,test_fused_attn.py
for better coverage and efficiency.