[core][gpu-objects] Driver should order all collective calls to avoid deadlock #51264
Labels
core
Issues that should be addressed in Ray Core
enhancement
Request for new feature and/or capability
gpu-objects
P0
Issues that should be fixed in short order
Description
Similar to compiled graphs, the driver should order all collective calls to avoid deadlocks.
Example 1:
Example 2: Both actors are single-threaded and synchronous. If
t1_1
is the input fort2_2
andt1_2
is the input fort2_1
, both use NCCL to transfer data. In this case, we should call NCCL recv oft2_2
beforet2_1
to avoid deadlock.Note: Check if this will work if we only have one CUDA stream.
Use case
No response
The text was updated successfully, but these errors were encountered: