Skip to content

TEST: add buffers for onesided tests #1100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

wfaderhold21
Copy link
Collaborator

What

Adds extra src/dst/work buffers for onesided algorithms in the MPI test.

Why ?

These are necessary to ensure RDMA writes are not performed on the next test before all processes have completed the initial test.

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@janjust janjust force-pushed the topic/fix-onesided-mpi-test branch from 8e1e817 to bd64bab Compare March 26, 2025 17:09
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wfaderhold21
We briefly discussed this PR during our WG. Can you please explain what precisely this patch aims to fix and why we need both a barrier and doubling the buffers?

@wfaderhold21
Copy link
Collaborator Author

Thanks @wfaderhold21 We briefly discussed this PR during our WG. Can you please explain what precisely this patch aims to fix and why we need both a barrier and doubling the buffers?

Sure.

The aim of this patch is to fix a potential issue when running the MPI test when multiple transports (i.e., RC + XPMEM) are being used. This could allow for one process (e.g., rank 0) to finish the collective and move to the next collective before another process (e.g., rank 1) has finished the check on its destination buffer.

The use of multiple buffers was to allow for processes to alternate between the buffers and prevent overwriting of a destination buffer because onesided operations are being used. However, this could be simply reduced to a barrier before collective start, which will prevent a process from entering the next collective while another process is checking the result of the previous collective.

@wfaderhold21
Copy link
Collaborator Author

@janjust are we including this for 1.5?

@janjust
Copy link
Collaborator

janjust commented Jul 11, 2025

yes - but we have several critical PRs to go through before, we will do one rebase at a time until we can merge them all.

@Sergei-Lebedev Sergei-Lebedev force-pushed the topic/fix-onesided-mpi-test branch from 3484cd8 to 21dd1db Compare July 12, 2025 11:40
@Sergei-Lebedev
Copy link
Contributor

ok to test

@Sergei-Lebedev Sergei-Lebedev merged commit d4b3300 into openucx:master Jul 12, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants