Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Dockerfile with necessary changes from NeMo for H100 self attention accuracy. #550

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jstjohn
Copy link
Collaborator

@jstjohn jstjohn commented Dec 19, 2024

  • Noticed poor accuracy on H100 relative to the same reference model involving flash attention.
  • Was able to replicate accuracy on H100 after making these changes to the dockerfile.

…ve good H100 numerical accuracy in flash attention with TE
@jstjohn
Copy link
Collaborator Author

jstjohn commented Dec 19, 2024

/build-ci

@jstjohn jstjohn requested a review from cspades December 19, 2024 01:19
Dockerfile Outdated Show resolved Hide resolved
Signed-off-by: John St. John <[email protected]>
pstjohn added a commit that referenced this pull request Dec 23, 2024
Cherry picks some additional commits into #550 to see if these also fix
CI for 24.10

---------

Signed-off-by: John St. John <[email protected]>
Co-authored-by: Farhad Ramezanghorbani <[email protected]>
Co-authored-by: John St John <[email protected]>
Co-authored-by: John St. John <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants