Skip to content

Add get_nvlink_address() to device window API and Triton bindings#1012

Open
siyengar wants to merge 1 commit intometa-pytorch:mainfrom
siyengar:export-D95908486
Open

Add get_nvlink_address() to device window API and Triton bindings#1012
siyengar wants to merge 1 commit intometa-pytorch:mainfrom
siyengar:export-D95908486

Conversation

@siyengar
Copy link
Contributor

Summary:
Add get_nvlink_address() method to TorchCommWindowNCCLX that returns the
NVLink-mapped device pointer for a peer's window memory. This calls
ncclGetPeerDevicePointer (host-side, NCCLX 2.29+ when available) to resolve
the LSA flat address for NVLink-accessible peers, returning nullptr for peers
not reachable via NVLink. The API guard is set at NCCLX 2.28+ with a nested
2.29+ check for the actual host API call, falling back to returning nullptr
on 2.28.x.

Also add torchcomms_get_nvlink_address() to the Triton device bindings,
which calls ncclGetPeerPointer() (device-side, available in 2.28+) to
compute the NVLink address directly on the GPU.

Changes:

  • NcclxApi: Add winGetPeerDevicePointer() virtual method (v2.28+ guard, v2.29+ actual call)
  • TorchCommWindowNCCLX: Add get_nvlink_address(peer, offset) method
  • torchcomms_device.h/.cu: Add torchcomms_get_nvlink_address extern
  • Triton Python wrapper: Add get_nvlink_address() function
  • Pybind: Register get_nvlink_address (v2.28+ only)
  • Type stubs: Add get_nvlink_address signature

Differential Revision: D95908486

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 10, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 10, 2026

@siyengar has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95908486.

siyengar added a commit to siyengar/torchcomms-2 that referenced this pull request Mar 10, 2026
…ta-pytorch#1012)

Summary:

Add get_nvlink_address() method to TorchCommWindowNCCLX that returns the
NVLink-mapped device pointer for a peer's window memory. This calls
ncclGetPeerDevicePointer (host-side, NCCLX 2.29+ when available) to resolve
the LSA flat address for NVLink-accessible peers, returning nullptr for peers
not reachable via NVLink. The API guard is set at NCCLX 2.28+ with a nested
2.29+ check for the actual host API call, falling back to returning nullptr
on 2.28.x.

Also add torchcomms_get_nvlink_address() to the Triton device bindings,
which calls ncclGetPeerPointer() (device-side, available in 2.28+) to
compute the NVLink address directly on the GPU.

Changes:
- NcclxApi: Add winGetPeerDevicePointer() virtual method (v2.28+ guard, v2.29+ actual call)
- TorchCommWindowNCCLX: Add get_nvlink_address(peer, offset) method
- torchcomms_device.h/.cu: Add torchcomms_get_nvlink_address extern
- Triton Python wrapper: Add get_nvlink_address() function
- Pybind: Register get_nvlink_address (v2.28+ only)
- Type stubs: Add get_nvlink_address signature

Reviewed By: goelayu

Differential Revision: D95908486
…ta-pytorch#1012)

Summary:

Add get_nvlink_address() method to TorchCommWindowNCCLX that returns the
NVLink-mapped device pointer for a peer's window memory. This calls
ncclGetPeerDevicePointer (host-side, NCCLX 2.29+ when available) to resolve
the LSA flat address for NVLink-accessible peers, returning nullptr for peers
not reachable via NVLink. The API guard is set at NCCLX 2.28+ with a nested
2.29+ check for the actual host API call, falling back to returning nullptr
on 2.28.x.

Also add torchcomms_get_nvlink_address() to the Triton device bindings,
which calls ncclGetPeerPointer() (device-side, available in 2.28+) to
compute the NVLink address directly on the GPU.

Changes:
- NcclxApi: Add winGetPeerDevicePointer() virtual method (v2.28+ guard, v2.29+ actual call)
- TorchCommWindowNCCLX: Add get_nvlink_address(peer, offset) method
- torchcomms_device.h/.cu: Add torchcomms_get_nvlink_address extern
- Triton Python wrapper: Add get_nvlink_address() function
- Pybind: Register get_nvlink_address (v2.28+ only)
- Type stubs: Add get_nvlink_address signature

Reviewed By: goelayu

Differential Revision: D95908486
siyengar added a commit to siyengar/torchcomms-2 that referenced this pull request Mar 11, 2026
…ta-pytorch#1012)

Summary:

Add get_nvlink_address() method to TorchCommWindowNCCLX that returns the
NVLink-mapped device pointer for a peer's window memory. This calls
ncclGetPeerDevicePointer (host-side, NCCLX 2.29+ when available) to resolve
the LSA flat address for NVLink-accessible peers, returning nullptr for peers
not reachable via NVLink. The API guard is set at NCCLX 2.28+ with a nested
2.29+ check for the actual host API call, falling back to returning nullptr
on 2.28.x.

Also add torchcomms_get_nvlink_address() to the Triton device bindings,
which calls ncclGetPeerPointer() (device-side, available in 2.28+) to
compute the NVLink address directly on the GPU.

Changes:
- NcclxApi: Add winGetPeerDevicePointer() virtual method (v2.28+ guard, v2.29+ actual call)
- TorchCommWindowNCCLX: Add get_nvlink_address(peer, offset) method
- torchcomms_device.h/.cu: Add torchcomms_get_nvlink_address extern
- Triton Python wrapper: Add get_nvlink_address() function
- Pybind: Register get_nvlink_address (v2.28+ only)
- Type stubs: Add get_nvlink_address signature

Reviewed By: goelayu

Differential Revision: D95908486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant