Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JAX] Bug fix for distributed normalization #1366

Merged
merged 4 commits into from
Dec 12, 2024

Conversation

phu0ngng
Copy link
Collaborator

Description

  • Added CudnnHandlerInit into the prepare phases of normalization custom calls.
  • Make NormalizationPlanRegistry thread_local.

These changes should fix the failures in test_distributed_layernorm.py.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@phu0ngng phu0ngng requested a review from denera December 11, 2024 22:54
@phu0ngng
Copy link
Collaborator Author

/te-ci jax L1

@phu0ngng phu0ngng requested a review from zlsh80826 December 12, 2024 00:42
Copy link
Collaborator

@zlsh80826 zlsh80826 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@phu0ngng phu0ngng merged commit 0e1d9fa into NVIDIA:main Dec 12, 2024
22 checks passed
@phu0ngng phu0ngng deleted the distributed_norm_fixes branch December 12, 2024 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants