Skip to content

Use ncclx::Hints for NCCLx-specific config#1017

Open
pavanbalaji wants to merge 4 commits intometa-pytorch:mainfrom
pavanbalaji:export-D95972679
Open

Use ncclx::Hints for NCCLx-specific config#1017
pavanbalaji wants to merge 4 commits intometa-pytorch:mainfrom
pavanbalaji:export-D95972679

Conversation

@pavanbalaji
Copy link
Contributor

Summary:
In populateNcclConfigFromHints, NCCLx-specific parameters
(ncclAllGatherAlgo, lazySetupChannels, fastInitMode) were being
set directly on the ncclConfig_t struct fields. This bypasses the
ncclx::Hints API which is the intended mechanism for passing
NCCLx-specific configuration.

Change upstream NCCL config fields (blocking, cgaClusterSize,
minCTAs, etc.) to continue being set directly on the config
struct, while NCCLx-specific fields are now passed via an
ncclx::Hints object attached to config.hints.

Also update isFastInitEnable/useFastInit to check the hints
object for fastInitMode since it is no longer set directly on
the config struct.

Differential Revision: D95972679

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 11, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 11, 2026

@pavanbalaji has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95972679.

pavanbalaji added a commit to pavanbalaji/torchcomms-1 that referenced this pull request Mar 11, 2026
Summary:
Pull Request resolved: meta-pytorch#1017

In populateNcclConfigFromHints, NCCLx-specific parameters
(ncclAllGatherAlgo, lazySetupChannels, fastInitMode) were being
set directly on the ncclConfig_t struct fields.  This bypasses the
ncclx::Hints API which is the intended mechanism for passing
NCCLx-specific configuration.

Change upstream NCCL config fields (blocking, cgaClusterSize,
minCTAs, etc.) to continue being set directly on the config
struct, while NCCLx-specific fields are now passed via an
ncclx::Hints object attached to config.hints.

Also update isFastInitEnable/useFastInit to check the hints
object for fastInitMode since it is no longer set directly on
the config struct.

Reviewed By: minsii

Differential Revision: D95972679
pavanbalaji added a commit to pavanbalaji/torchcomms-1 that referenced this pull request Mar 11, 2026
Summary:
Pull Request resolved: meta-pytorch#1017

In populateNcclConfigFromHints, NCCLx-specific parameters
(ncclAllGatherAlgo, lazySetupChannels, fastInitMode) were being
set directly on the ncclConfig_t struct fields.  This bypasses the
ncclx::Hints API which is the intended mechanism for passing
NCCLx-specific configuration.

Change upstream NCCL config fields (blocking, cgaClusterSize,
minCTAs, etc.) to continue being set directly on the config
struct, while NCCLx-specific fields are now passed via an
ncclx::Hints object attached to config.hints.

Also update isFastInitEnable/useFastInit to check the hints
object for fastInitMode since it is no longer set directly on
the config struct.

Reviewed By: minsii

Differential Revision: D95972679
Summary:
Each public comm-creation API now immediately creates a
guaranteed non-null internalConfigPtr (using the user-passed
config or a default) and calls ncclxParseCommConfig on it.
This ensures each communicator gets a freshly allocated
ncclx::Config with sole ownership, preventing a double-free
when callers reuse the same ncclConfig_t for multiple comms.

Public APIs updated: ncclCommInitRank, ncclCommInitAll,
ncclCommInitRankConfig, ncclCommInitRankScalable,
ncclCommSplit, ncclCommShrink, and ncclCommGrow (v2_29).

Inside ncclxParseCommConfig, assert that config is non-null
and ncclxConfig is still at NCCL_CONFIG_UNDEF_PTR, rather
than silently returning or doing an idempotency check.

Removed ncclxParseCommConfig call from ncclCommInitChildComm
(private helper) since the public caller already did it.

Reviewed By: minsii

Differential Revision: D95714601
Summary:
Add unit tests exercising ncclxParseCommConfig, covering
flat ncclConfig_t fields, hints-based configuration, default
values, idempotency, and conflict detection between flat
fields and hints.

Applied identically to v2_27, v2_28, and v2_29.

Reviewed By: minsii

Differential Revision: D95714600
Summary:
Strip the "ncclx::" prefix in Hints::set() and Hints::get()
so that callers can use either "fastInitMode" or
"ncclx::fastInitMode" interchangeably.  The prefix is
stripped before key matching, storage, and lookup, so the
internal map always uses bare key names (compatible with
ncclxParseCommConfig).

Reviewed By: minsii

Differential Revision: D95972681
Summary:
In populateNcclConfigFromHints, NCCLx-specific parameters
(ncclAllGatherAlgo, lazySetupChannels, fastInitMode) were being
set directly on the ncclConfig_t struct fields.  This bypasses the
ncclx::Hints API which is the intended mechanism for passing
NCCLx-specific configuration.

Change upstream NCCL config fields (blocking, cgaClusterSize,
minCTAs, etc.) to continue being set directly on the config
struct, while NCCLx-specific fields are now passed via an
ncclx::Hints object attached to config.hints.

Also update isFastInitEnable/useFastInit to check the hints
object for fastInitMode since it is no longer set directly on
the config struct.

Reviewed By: minsii

Differential Revision: D95972679
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant