[C/PyTorch] Add support for multi-latent attention (MLA) #1039

cyanguwa · 2024-07-24T01:47:48Z

Description

This PR adds support for multi-latent attention (MLA) where head_dim_qk != head_dim_v.

This feature is only supported on the DotProductAttention level and by both UnfusedDotProductAttention backend and FusedAttention backend.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

Add support for MLA

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2024-07-24T02:00:30Z

/te-ci

Signed-off-by: Charlene Yang <[email protected]>

This reverts commit 67399a3. Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa · 2024-07-25T22:01:59Z

/te-ci

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa · 2024-07-30T19:32:29Z

/te-ci

cyanguwa · 2024-08-02T21:54:33Z

Some CI issue. Local tests for L0_jax_unittest using the CI container pass with no problem on A100/V100.

zlsh80826 · 2024-08-06T09:02:53Z

LGTM!

JiwenJ · 2024-11-28T08:57:57Z

Hello, does this support qk != v head dim

cyanguwa and others added 5 commits July 24, 2024 01:44

add multi-latent attention for DPA

0984f2f

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a682db2

for more information, see https://pre-commit.ci

fix Jax/Paddle API

502d9fc

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

d11128d

for more information, see https://pre-commit.ci

fix lint

67399a3

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa and others added 5 commits July 25, 2024 14:44

fix typo in test script

843affc

Signed-off-by: Charlene Yang <[email protected]>

fix too-many-boolean lint error

ccb5eb5

Signed-off-by: Charlene Yang <[email protected]>

Revert "fix lint"

9a6b862

This reverts commit 67399a3. Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into add_mla

a62eed7

[pre-commit.ci] auto fixes from pre-commit.com hooks

d2f498e

for more information, see https://pre-commit.ci

cyanguwa and others added 9 commits July 30, 2024 00:26

fix stride check in get_qkv_layout

04e7618

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'NVIDIA:main' into add_mla

c6733b2

WIP: fix layout_thd tests

2e5b463

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

d2086ae

for more information, see https://pre-commit.ci

WIP: debug info

7ac00d0

Signed-off-by: Charlene Yang <[email protected]>

fix merge conflict

b737066

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

834bf04

for more information, see https://pre-commit.ci

fix thd pad_between_seqs=False/True tests

801ddef

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

08078d2

for more information, see https://pre-commit.ci

cyanguwa requested a review from zlsh80826 July 30, 2024 19:30

cyanguwa mentioned this pull request Aug 2, 2024

[PyTorch] Update docs/example and benchmarks/ scripts #1075

Merged

13 tasks

zlsh80826 approved these changes Aug 6, 2024

View reviewed changes

cyanguwa merged commit 87939be into NVIDIA:main Aug 6, 2024
29 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C/PyTorch] Add support for multi-latent attention (MLA) #1039

[C/PyTorch] Add support for multi-latent attention (MLA) #1039

cyanguwa commented Jul 24, 2024

cyanguwa commented Jul 24, 2024

cyanguwa commented Jul 25, 2024

cyanguwa commented Jul 30, 2024

cyanguwa commented Aug 2, 2024

zlsh80826 commented Aug 6, 2024

JiwenJ commented Nov 28, 2024

[C/PyTorch] Add support for multi-latent attention (MLA) #1039

[C/PyTorch] Add support for multi-latent attention (MLA) #1039

Conversation

cyanguwa commented Jul 24, 2024

Description

Type of change

Changes

Checklist:

cyanguwa commented Jul 24, 2024

cyanguwa commented Jul 25, 2024

cyanguwa commented Jul 30, 2024

cyanguwa commented Aug 2, 2024

zlsh80826 commented Aug 6, 2024

JiwenJ commented Nov 28, 2024