-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CUDA Graph for MoE models #1233
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
9951d78
Align RNG tracker with megatron
buptzyb b5f7cdf
Fix module_params order and warmup bug in cudagraph
buptzyb 0ede5cb
Add fp8_group argument and fix fp8 accuracy issue for cudagraph
buptzyb 5596437
Add TE modules and weights filters to support MoE models
buptzyb 1d0759e
Revert self.fp8
buptzyb 73a22e2
Use hooks to filter module params
buptzyb cd56618
Filter all TE modules in hooks
buptzyb 2a7f54b
Format code
buptzyb c6dddaf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a01602e
Update graph.py
yaox12 8017b6d
Revert CudaRNGStatesTracker
buptzyb 41d6100
Format Update
yifeis-nv c82faa2
Revert "Use hooks to filter module params"
yifeis-nv 938f325
Remove filtering module params
buptzyb 1522ecc
Merge branch 'main' into cudagraph_moe
yaox12 4487315
Merge branch 'main' into cudagraph_moe
timmoon10 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems correct to me, but it's odd if the Mcore integration was working before. @ksivaman Have we run this with Mcore, or did we run with
num_microbatches=1
?This changes the interpretation of
per_callable_module_params
from(num_chunks, layers_per_chunk, num_microbatches)
to(num_chunks, num_microbatches, layers_per_chunk)
. This matches the interpretation ofper_callable_*
lists when capturing graphs:TransformerEngine/transformer_engine/pytorch/graph.py
Lines 237 to 239 in 3b89c36