-
Notifications
You must be signed in to change notification settings - Fork 127
Update rocshmem and enable GDA #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the contribution. @erieaton-amd @drprajap Is this PR ready? |
|
We have updated the code. If you have any problems in refactoring your PR, feel free to leave comments : ) |
Signed-off-by: Eric Eaton <[email protected]>
This updates the initialization code for test_ag_gemm_intra_node.py. Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
|
I refactored the PR. However, I am having some trouble getting even the main branch to work now. |
Signed-off-by: Eric Eaton <[email protected]>
|
I have the tests |
|
@erieaton-amd @drprajap Seems CI failed, can you please take a look? |
|
The code right now only works if the machine has a Mellanox card set up. This is because the rocshmem backend is hard coded into the bitcode, which has to be cleaned up somehow so it can use the IPC backend also. Right now I'm trying to figure out why the tests I wrote are now failing on the machine that does have a Mellanox card. |
Also, adjusted test_put_signal.py to be more consistent with original test. Signed-off-by: Eric Eaton <[email protected]>
Passes now with no errors, with and without ROCSHMEM_DISABLE_MIXED_IPC=1 Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
Signed-off-by: Eric Eaton <[email protected]>
|
Ok, the tests seems to be working again. I've enabled a dispatch feature in rocshmem that should select the right backend in the bitcode, so maybe the CI will work now. |
|
@wenlei-bao I am investigating an issue that the function |
Signed-off-by: Eric Eaton <[email protected]>
|
The bitcode was missing a file, try the CI again. |
This enables the RDMA/GDA support that was added to rocshmem. It's working on a single node with an MLX5 card. This WIP has not yet been tested on separate machines.