Skip to content

TL/MLX5: ctx global status check #1113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Added checks for successful creation of the rcache and topology initialization.

Why ?

All the resource initialization must happen in context create epilog. Otherwise if context create fails on one rank the other ranks will hang in context create epilog broadcast

Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch!
Not a request, but for readability, I would suggest to move topo and rcache init out of
ucc_tl_mlx5_context_ib_ctx_pd_setup and place it directly in ucc_tl_mlx5_context_create_epilog
But if this causes too much refactoring, we probably shouldn't bother

@janjust janjust force-pushed the topic/mlx5_do_global_status_check branch from 5cd1a47 to 20e48c3 Compare April 22, 2025 03:15
@B-a-S
Copy link
Collaborator

B-a-S commented Apr 23, 2025

/bot:retest

Added checks for successful creation of the rcache and topology initialization.
@janjust janjust force-pushed the topic/mlx5_do_global_status_check branch from 20e48c3 to b12664e Compare April 24, 2025 19:34
@janjust janjust merged commit 3373b2a into openucx:master Apr 25, 2025
9 checks passed
MamziB pushed a commit to MamziB/ucc-forked that referenced this pull request Jul 9, 2025
Added checks for successful creation of the rcache and topology initialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants