Skip to content

skip storing window buffer tensor if in graph mode#1004

Closed
tianfengfrank wants to merge 1 commit intomainfrom
export-D95841874
Closed

skip storing window buffer tensor if in graph mode#1004
tianfengfrank wants to merge 1 commit intomainfrom
export-D95841874

Conversation

@tianfengfrank
Copy link
Contributor

Summary:
In CUDA graph capture mode, the window object no longer holds a reference to the registered tensor (buf_tensor_), allowing the physical memory buffer to be reused. The same physical buffer is still retrieved when the window is used since the NCCL window registration (commWindowRegister) already tracks it independently (as discussed with siyengar).

Changes:

  • TorchCommWindowNCCLX.cpp: Skip buf_tensor_ storage when getGraphCaptureMode() is true, with a TC_LOG(WARNING) to note get_tensor() will return nullopt.
  • TorchCommWindow.hpp: Updated get_tensor() comment to document possible nullopt in graph capture mode.
  • TorchCommWindowNCCLXTest.cpp: Added two unit tests with mocked CUDA stream capture to verify buf_tensor_ is skipped in graph mode and stored in normal mode.

Differential Revision: D95841874

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 9, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 9, 2026

@tianfengfrank has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95841874.

Summary:

 In CUDA graph capture mode, the window object no longer holds a reference to the registered tensor (buf_tensor_), allowing the physical memory buffer to be reused. The same physical buffer is still retrieved when the window is used since the NCCL window registration (commWindowRegister) already tracks it independently (as discussed with siyengar).

Changes:
  - TorchCommWindowNCCLX.cpp: Skip buf_tensor_ storage when getGraphCaptureMode() is true, with a TC_LOG(WARNING) to note get_tensor() will return nullopt.
  - TorchCommWindow.hpp: Updated get_tensor() comment to document possible nullopt in graph capture mode.
  - TorchCommWindowNCCLXTest.cpp: Added two unit tests with mocked CUDA stream capture to verify buf_tensor_ is skipped in graph mode and stored in normal mode.

Reviewed By: siyengar

Differential Revision: D95841874
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 11, 2026

This pull request has been merged in 808b79a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant