Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ddp config to improve ESM-2 15B MFU #520

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sichu2023
Copy link
Collaborator

@sichu2023 sichu2023 commented Dec 11, 2024

Update DDP config to speed up ESM-2 15B pretraining. Turn off grad_reduce_in_fp32 in mixed precision plugin (default is True) to reduce memory consumption and overlap_grad_reduce and average_in_collective to improve performance.

Pause overlap_param_gather=True to wait for NeMo's fix.

@sichu2023 sichu2023 self-assigned this Dec 11, 2024
@sichu2023 sichu2023 requested a review from pstjohn December 11, 2024 12:49
@sichu2023 sichu2023 force-pushed the sichu/update-ddp-config branch from ed04b88 to 090828c Compare December 12, 2024 09:35
@sichu2023 sichu2023 force-pushed the sichu/update-ddp-config branch from 090828c to a550b51 Compare December 12, 2024 14:06
@sichu2023 sichu2023 marked this pull request as ready for review December 12, 2024 14:06
@sichu2023
Copy link
Collaborator Author

/build-ci

@sichu2023 sichu2023 enabled auto-merge (squash) December 12, 2024 14:07
@sichu2023 sichu2023 force-pushed the sichu/update-ddp-config branch from ecd102a to cfa66d3 Compare December 12, 2024 17:28
@sichu2023
Copy link
Collaborator Author

/build-ci

@sichu2023 sichu2023 requested a review from pstjohn December 12, 2024 17:29
@sichu2023 sichu2023 force-pushed the sichu/update-ddp-config branch from cfa66d3 to e1e9037 Compare December 16, 2024 16:59
@sichu2023 sichu2023 disabled auto-merge December 16, 2024 17:21
Copy link
Collaborator

@pstjohn pstjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change any of the current defaults? Might be good to call those out in the PR body if so

@sichu2023
Copy link
Collaborator Author

Does this change any of the current defaults? Might be good to call those out in the PR body if so

Thanks. Just added.

@sichu2023 sichu2023 force-pushed the sichu/update-ddp-config branch from dbf2ea7 to b944e26 Compare December 24, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants