Fix attention mask type for Flash Attention + CP + THD #1354

xrennvidia · 2024-12-04T04:33:21Z

Description

Fix qkv_format setting in DotProductAttention.
Use padding mask type for flash attention with THD format in CP unit tests.

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Signed-off-by: Xiaowei Ren <[email protected]>

xrennvidia · 2024-12-04T04:36:04Z

/te-ci pytorch L1

xrennvidia added 2 commits December 3, 2024 20:23

always have padding mask type for both flash and fused attentions

941c013

Signed-off-by: Xiaowei Ren <[email protected]>

remove an redundant assert

26c97b9

Signed-off-by: Xiaowei Ren <[email protected]>

xrennvidia requested a review from cyanguwa December 4, 2024 04:36

cyanguwa approved these changes Dec 4, 2024

View reviewed changes

Merge branch 'main' into xren/cp_mask_type

33e5be0

xrennvidia merged commit d978e80 into NVIDIA:main Dec 5, 2024
14 checks passed

xrennvidia deleted the xren/cp_mask_type branch December 5, 2024 21:46