[JAX] Expose cp params to jax DPA api #1292

kocchop · 2024-10-27T09:32:30Z

Description

Surface the context parallelism parameters to the JAX DotProductAttention API

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Please list the changes introduced in this PR:

added the required 2 CP configs to the high level jax DPA api
removed the is_context_parallel arg from `is_fused_attn_available().
removed the attention mask check for CP from is_fused_attn_available(). The check_supported() inside _FusedAttnCPWithAllGatherHelper essentially performs the check
fixed the test_distributed_fused_attn accordingly by removing the last arg related to is_context_parallel flag from is _fused_attn_available() api calls

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

mgoldfarb-nvidia · 2024-10-28T22:00:29Z

transformer_engine/jax/attention.py

@@ -215,11 +214,6 @@ def make_helper(attn_mask_type):
    if not make_helper(attn_mask_type).is_fused_attn_kernel_available():
        return False

-    # For context parallel need to check additional masking types


So there is no way to pass None from maxtext when CP is not used for the axis? I think we should keep this check here so that we can avoid exceptions on cuDNN when we don't have appropriate support.

Hi @mgoldfarb-nvidia , we can definitely pass a boolean from jax side. However, I decided to remove this because:

Wanted to make as minimal changes as possible to the high level DotProductAttention api

The same check is performed here. It essentially provides meaningful error msg and its higher than cudnn level:

... ... The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/workspace/maxtext_workspace/context/maxtext/MaxText/train.py", line 757, in <module> app.run(main) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/opt/workspace/maxtext_workspace/context/maxtext/MaxText/train.py", line 753, in main train_loop(config) File "/opt/workspace/maxtext_workspace/context/maxtext/MaxText/train.py", line 649, in train_loop state, metrics = p_train_step(state, example_batch, nextrng) jaxlib.xla_extension.XlaRuntimeError: INTERNAL: custom_partitioner: Traceback (most recent call last): File "/opt/workspace/maxtext_workspace/context/maxtext/MaxText/train.py", line 757, in <module> File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main File "/opt/workspace/maxtext_workspace/context/maxtext/MaxText/train.py", line 753, in main File "/opt/workspace/maxtext_workspace/context/maxtext/MaxText/train.py", line 649, in train_loop File "/opt/jax/jax/_src/traceback_util.py", line 180, in reraise_with_filtered_traceback File "/opt/jax/jax/_src/pjit.py", line 337, in cache_miss File "/opt/jax/jax/_src/pjit.py", line 187, in _python_pjit_helper File "/opt/jax/jax/_src/core.py", line 2820, in bind File "/opt/jax/jax/_src/core.py", line 442, in bind_with_trace File "/opt/jax/jax/_src/core.py", line 955, in process_primitive File "/opt/jax/jax/_src/pjit.py", line 1736, in _pjit_call_impl File "/opt/jax/jax/_src/pjit.py", line 1712, in call_impl_cache_miss File "/opt/jax/jax/_src/pjit.py", line 1642, in _pjit_call_impl_python File "/opt/jax/jax/_src/interpreters/pxla.py", line 2346, in compile File "/opt/jax/jax/_src/interpreters/pxla.py", line 2855, in from_hlo File "/opt/jax/jax/_src/interpreters/pxla.py", line 2667, in _cached_compilation File "/opt/jax/jax/_src/compiler.py", line 434, in compile_or_get_cached File "/opt/jax/jax/_src/compiler.py", line 662, in _compile_and_write_cache File "/opt/jax/jax/_src/profiler.py", line 333, in wrapper File "/opt/jax/jax/_src/compiler.py", line 267, in backend_compile File "/opt/jax/jax/_src/custom_partitioning.py", line 155, in _custom_partitioning_partition File "/opt/transformer-engine/transformer_engine/jax/cpp_extensions/attention.py", line 1202, in partition File "/opt/transformer-engine/transformer_engine/jax/cpp_extensions/attention.py", line 1046, in check_supported ValueError: Context parallel fused attention only supports masking types: NVTE_Mask_Type.NVTE_NO_MASK,NVTE_Mask_Type.NVTE_CAUSAL_MASK got: NVTE_Mask_Type.NVTE_PADDING_CAUSAL_MASK

Okay- I have a proposal although it might not be the cleanest. We use this method in our unit tests to also know if the configuration is supported before running i.e. to skip invalid configs we know to be failing.

We could add the is_context_parallel argument back but as an Optional[bool] and if its passed as None we don't attempt to do the more specific check. That way we can still query the full support at the top level

Can we infer is_context_parallel from cp_axis? I noticed there is a similar logic here. If so, we can pass is_context_parallel here in the DotProductAttention module without adding an additional argument.

The challenge @kocchop ran into is that this check only works once the code is being transformed by Jax jit. He found that in MaxText the axis information is not available at the DPA api level and thus the implicit axis check fails.

Got it. Then how about this:
We keep is_fused_attn_kernel_available context_parallism-agnostic, means no is_context_parallel argument. But we call is_fused_attn_kernel_available again in _FusedAttnCPWithAllGatherHelper.check_support if the attn_mask_type == AttnMaskType.CAUSAL_MASK.
Unlike other configs, we can still fall back to the unfused attn. If we don't have the fused attn kernels with CP, then we can just raise a ValueError for that

That works and maybe the best compromise here. The one place we need to update then is the unit test which needs to skip configs not supported. We currently rely on is_fused_attn_kernel_available to do this check.

We can call into _FusedAttnCPWithAllGatherHelper.check_support as you suggest here from the unit test to also properly skip as needed.

zlsh80826 · 2024-10-30T09:30:18Z

transformer_engine/jax/attention.py

@@ -215,11 +214,6 @@ def make_helper(attn_mask_type):
    if not make_helper(attn_mask_type).is_fused_attn_kernel_available():
        return False

-    # For context parallel need to check additional masking types


Can we infer is_context_parallel from cp_axis? I noticed there is a similar logic here. If so, we can pass is_context_parallel here in the DotProductAttention module without adding an additional argument.

transformer_engine/jax/flax/transformer.py

Signed-off-by: Md Fahim Faysal Khan <[email protected]>

Signed-off-by: Michael Goldfarb <[email protected]>

mgoldfarb-nvidia · 2024-10-31T22:21:56Z

/te-ci jax

mgoldfarb-nvidia

LGTM!

zlsh80826 · 2024-11-01T13:46:31Z

LGTM, I will approve it once all CI passed.

phu0ngng · 2024-11-01T15:57:21Z

Please rebase with the main branch to include the fix introduced in #1304 to avoid unresolved failures caused by FFI and rerun the L1 CI @kocchop. Thanks

mgoldfarb-nvidia · 2024-11-01T20:40:31Z

/te-ci jax L1

Exposed context parallel params to DPA api Signed-off-by: Md Fahim Faysal Khan <[email protected]> Signed-off-by: Michael Goldfarb <[email protected]> --------- Signed-off-by: Md Fahim Faysal Khan <[email protected]> Signed-off-by: Michael Goldfarb <[email protected]> Co-authored-by: Michael Goldfarb <[email protected]>

mgoldfarb-nvidia requested changes Oct 28, 2024

View reviewed changes

kocchop requested a review from mgoldfarb-nvidia October 29, 2024 03:01

phu0ngng requested a review from zlsh80826 October 29, 2024 18:37

denera requested review from denera and phu0ngng October 29, 2024 18:39

zlsh80826 reviewed Oct 30, 2024

View reviewed changes

kocchop and others added 2 commits October 30, 2024 20:43

exposed cp params to DPA api

535af66

Signed-off-by: Md Fahim Faysal Khan <[email protected]>

Clean up checks in unit test.

0b302d4

Signed-off-by: Michael Goldfarb <[email protected]>

mgoldfarb-nvidia force-pushed the faysal/expose-cp-to-jax-dpa branch from b0c5c06 to 0b302d4 Compare October 30, 2024 20:43

phu0ngng removed their request for review October 31, 2024 15:20

kocchop requested a review from zlsh80826 October 31, 2024 22:17

Merge branch 'main' into faysal/expose-cp-to-jax-dpa

cac031b

mgoldfarb-nvidia approved these changes Oct 31, 2024

View reviewed changes

zlsh80826 changed the title ~~expose cp params to jax DPA api~~ [JAX] Expose cp params to jax DPA api Nov 1, 2024

Merge branch 'main' into faysal/expose-cp-to-jax-dpa

6a12787

mgoldfarb-nvidia merged commit d725686 into NVIDIA:main Nov 4, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Expose cp params to jax DPA api #1292

[JAX] Expose cp params to jax DPA api #1292

kocchop commented Oct 27, 2024 •

edited

Loading

mgoldfarb-nvidia Oct 28, 2024

kocchop Oct 28, 2024

mgoldfarb-nvidia Oct 29, 2024

zlsh80826 Oct 30, 2024

mgoldfarb-nvidia Oct 30, 2024

zlsh80826 Oct 30, 2024 •

edited

Loading

mgoldfarb-nvidia Oct 30, 2024

zlsh80826 Oct 30, 2024

mgoldfarb-nvidia commented Oct 31, 2024

mgoldfarb-nvidia left a comment

zlsh80826 commented Nov 1, 2024

phu0ngng commented Nov 1, 2024 •

edited

Loading

mgoldfarb-nvidia commented Nov 1, 2024

[JAX] Expose cp params to jax DPA api #1292

[JAX] Expose cp params to jax DPA api #1292

Conversation

kocchop commented Oct 27, 2024 • edited Loading

Description

Type of change

Changes

Checklist:

mgoldfarb-nvidia Oct 28, 2024

Choose a reason for hiding this comment

kocchop Oct 28, 2024

Choose a reason for hiding this comment

mgoldfarb-nvidia Oct 29, 2024

Choose a reason for hiding this comment

zlsh80826 Oct 30, 2024

Choose a reason for hiding this comment

mgoldfarb-nvidia Oct 30, 2024

Choose a reason for hiding this comment

zlsh80826 Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

mgoldfarb-nvidia Oct 30, 2024

Choose a reason for hiding this comment

zlsh80826 Oct 30, 2024

Choose a reason for hiding this comment

mgoldfarb-nvidia commented Oct 31, 2024

mgoldfarb-nvidia left a comment

Choose a reason for hiding this comment

zlsh80826 commented Nov 1, 2024

phu0ngng commented Nov 1, 2024 • edited Loading

mgoldfarb-nvidia commented Nov 1, 2024

kocchop commented Oct 27, 2024 •

edited

Loading

zlsh80826 Oct 30, 2024 •

edited

Loading

phu0ngng commented Nov 1, 2024 •

edited

Loading