[core][gpu-objects] Driver should order all collective calls to avoid deadlock #51264

kevin85421 · 2025-03-11T18:58:48Z

Description

Similar to compiled graphs, the driver should order all collective calls to avoid deadlocks.

Example 1:

Avoid passing tensors within the same actor using NCCL. Instead, we should access the in-actor store directly.

Example 2: Both actors are single-threaded and synchronous. If t1_1 is the input for t2_2 and t1_2 is the input for t2_1, both use NCCL to transfer data. In this case, we should call NCCL recv of t2_2 before t2_1 to avoid deadlock.

Actor 1: t1_1, t1_2
Actor 2: t2_1, t2_2

Note: Check if this will work if we only have one CUDA stream.

Use case

No response

The text was updated successfully, but these errors were encountered:

kevin85421 added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) P0 Issues that should be fixed in short order gpu-objects and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 11, 2025

kevin85421 self-assigned this Mar 11, 2025

kevin85421 added the core Issues that should be addressed in Ray Core label Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][gpu-objects] Driver should order all collective calls to avoid deadlock #51264

[core][gpu-objects] Driver should order all collective calls to avoid deadlock #51264

kevin85421 commented Mar 11, 2025 •

edited

Loading

[core][gpu-objects] Driver should order all collective calls to avoid deadlock #51264

[core][gpu-objects] Driver should order all collective calls to avoid deadlock #51264

Comments

kevin85421 commented Mar 11, 2025 • edited Loading

Description

Use case

kevin85421 commented Mar 11, 2025 •

edited

Loading