-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving communication overlap for the case of multi kernel queue usage #1308
Conversation
@erhoo82 Hi Sangkug, this is a PR for launch ordering work. Could you please assign a reviewer? |
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
@youngeunkwon0405 The TP overlap unit tests explicitly set Also please launch the L1 tests with |
Hi @denera, I have updated the Also, could you please elaborate on more details about the following? I am new to writing a test and also new to the ci process.
I have tested the modified test case only and the following was a new result. ../lustre/fsw/coreai_dlalgo_llm/youngeunk/nemo/nemo.dev/mount/TransformerEngine-youngeunk/tests/pytorch/distributed/test_comm_gemm_overlap.py::test_bulk_overlaps[ALL-GATHER - BF16 - 1 connections] PASSED ========================= 6 passed in 93.35s (0:01:33) ========================= |
/te-ci pytorch L1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending rebase on latest TE/main and clean CI results.
@denera Rebased with the main. Could you please let me know what the next step would be? |
/te-ci pytorch L1 |
2 similar comments
/te-ci pytorch L1 |
/te-ci pytorch L1 |
/te-ci pytorch L0 L1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We can merge, pending clean CI results. Your CI permissions might not have taken effect yet because I don't see any pipelines for this PR from your trigger. I triggered it again and the pass/fail should show up on GitHub when it's done.
/te-ci pytorch L0 L1 |
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
…age (NVIDIA#1308) * draft implementation Signed-off-by: Youngeun Kwon <[email protected]> * compile error fix Signed-off-by: Youngeun Kwon <[email protected]> * fix compile error Signed-off-by: Youngeun Kwon <[email protected]> * remove print Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edit comments Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * edit the bulk-overlap test case Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add version guard Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add runtime version guard Signed-off-by: Youngeun Kwon <[email protected]> * fix the version guard Signed-off-by: Youngeun Kwon <[email protected]> --------- Signed-off-by: Youngeun Kwon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
The current TP-overlap relay is on a single kernel queue to configure launch ordering to control compute-communication overlap, which fails to overlap when multi kernel queue is used.
This PR enforces launch ordering using the LaunchCompletionEvent feature between the communication kernel and the compute kernel to ensure the overlap.
This feature is specific to Hopper and applies only to bulk overlap cases.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: