Skip to content

Conversation

@samnordmann
Copy link
Collaborator

@samnordmann samnordmann commented Nov 13, 2025

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 13, 2025

Greptile Overview

Greptile Summary

This PR implements SymmetricContiguousView, a new Host IR operation that "unshards" symmetric memory tensors by creating a contiguous virtual address mapping across all ranks using CUDA VMM and IPC handles.

Key changes:

  • Added SymmetricContiguousView IR node that transforms sharded tensors (with DIDx parallelization) into unsharded contiguous views
  • Implemented SymMemForContiguousView handle class that manages IPC handle exchange and contiguous memory mapping
  • Added caching support in SymmetricMemoryHandleCache to avoid recreating expensive VMM mappings
  • Comprehensive test validates correct data placement across ranks in the contiguous view

How it works:
The operation takes a sharded input tensor [1, N] with symmetric memory and produces an unsharded output [world_size, N]. At runtime, it exchanges IPC handles between ranks and uses CUDA VMM to create a contiguous virtual address space where all ranks' data is accessible, effectively making the distributed tensor appear as a single contiguous tensor.

Issue found:

  • evaluator.cpp:826 performs bounds check handle->tensor().size(1) without first verifying the tensor has at least 2 dimensions, which could cause out-of-bounds access

Confidence Score: 4/5

  • Safe to merge with one bounds check fix needed
  • The implementation is well-structured with proper caching, comprehensive testing, and clear separation of concerns. One logical error exists where dimension bounds aren't validated before accessing size(1), which could cause crashes with malformed tensors. The core VMM/IPC logic reuses existing SymmetricTensor infrastructure and follows established patterns in the codebase.
  • csrc/host_ir/evaluator.cpp needs bounds validation fix before dimension access

Important Files Changed

File Analysis

Filename Score Overview
csrc/host_ir/host_ir.cpp 5/5 implements constructor and string methods for SymmetricContiguousView, validates input has symmetric memory type
csrc/host_ir/evaluator.cpp 4/5 implements runtime evaluation of SymmetricContiguousView, creates contiguous view from cache and squeezes dimension, has dimension check issue
csrc/multidevice/ipc_handle.cpp 5/5 implements SymMemForContiguousView constructor with symmetric tensor setup and dimension squeezing, adds cache lookup for contiguous view

Sequence Diagram

sequenceDiagram
    participant User
    participant HostIrEvaluator
    participant SymMemHandleCache
    participant SymMemForContiguousView
    participant SymmetricTensor
    participant CUDA_VMM as CUDA VMM/IPC
    
    User->>HostIrEvaluator: handle(SymmetricContiguousView)
    HostIrEvaluator->>HostIrEvaluator: getKnownConcreteValue(in_tv)
    Note over HostIrEvaluator: Get sharded input tensor [1, N]
    
    HostIrEvaluator->>SymMemHandleCache: get({in_tensor, expr})
    
    alt Handle not in cache
        SymMemHandleCache->>SymMemForContiguousView: new SymMemForContiguousView(in_tensor, expr)
        SymMemForContiguousView->>SymmetricTensor: new SymmetricTensor(in_tensor)
        SymMemForContiguousView->>SymmetricTensor: setupContiguousView(tag)
        
        SymmetricTensor->>CUDA_VMM: cuMemAddressReserve(total_size)
        Note over SymmetricTensor,CUDA_VMM: Reserve contiguous virtual address space<br/>for all ranks
        
        loop For each rank
            SymmetricTensor->>CUDA_VMM: Exchange IPC handles
            SymmetricTensor->>CUDA_VMM: cuMemMap(region, getAllocHandle(rank))
            SymmetricTensor->>CUDA_VMM: cuMemSetAccess(region, READWRITE)
        end
        
        SymmetricTensor->>SymmetricTensor: Create tensor from mapped memory
        Note over SymmetricTensor: Shape: [world_size, ...local_shape]
        SymmetricTensor-->>SymMemForContiguousView: Return contiguous_view
        
        SymMemForContiguousView->>SymMemForContiguousView: squeeze(0) if size(0)==1
        SymMemForContiguousView-->>SymMemHandleCache: Return handle
        SymMemHandleCache->>SymMemHandleCache: Cache handle
    end
    
    SymMemHandleCache-->>HostIrEvaluator: Return cached handle
    HostIrEvaluator->>SymMemForContiguousView: tensor()
    SymMemForContiguousView-->>HostIrEvaluator: Return contiguous tensor [world_size, 1, N]
    HostIrEvaluator->>HostIrEvaluator: squeeze(1)
    Note over HostIrEvaluator: Final shape: [world_size, N]
    HostIrEvaluator->>HostIrEvaluator: bind(out_tv, contiguous_tensor)
    HostIrEvaluator-->>User: Return unsharded tensor
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +826 to +827
handle->tensor().size(1) == 1,
"Contiguous view must have size 1 on sharded dimension");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: checking size(1) assumes tensor has at least 2 dimensions, but input has shape [1, N] where dim 0 is the sharded dimension. After getContiguousView() returns [world_size, 1, N], checking size(1) == 1 is correct, but should verify tensor has enough dimensions first

Suggested change
handle->tensor().size(1) == 1,
"Contiguous view must have size 1 on sharded dimension");
NVF_ERROR(
handle->tensor().dim() >= 2 && handle->tensor().size(1) == 1,
"Contiguous view must have at least 2 dimensions with size 1 on second dimension");

samnordmann added a commit that referenced this pull request Nov 25, 2025
Code reference for using cuda VMM and multicast Driver API, that will be
used in Symmetric Memory implementation.

replaces the already approved #5385

- **Here ==> #5516**
- #5517
- #5518
- #5519
- #5520 

Full branch for reference: #5515
samnordmann added a commit that referenced this pull request Nov 27, 2025
- #5516
- **Here ==> #5517**
- #5518
- #5519
- #5520 

Full branch for reference: #5515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants