[SymMem 5/5] Contiguous View #5520

samnordmann · 2025-11-13T17:50:40Z

Full branch for reference: #5515

greptile-apps · 2025-11-13T17:53:22Z

Greptile Overview

Greptile Summary

This PR implements SymmetricContiguousView, a new Host IR operation that "unshards" symmetric memory tensors by creating a contiguous virtual address mapping across all ranks using CUDA VMM and IPC handles.

Key changes:

Added SymmetricContiguousView IR node that transforms sharded tensors (with DIDx parallelization) into unsharded contiguous views
Implemented SymMemForContiguousView handle class that manages IPC handle exchange and contiguous memory mapping
Added caching support in SymmetricMemoryHandleCache to avoid recreating expensive VMM mappings
Comprehensive test validates correct data placement across ranks in the contiguous view

How it works:
The operation takes a sharded input tensor [1, N] with symmetric memory and produces an unsharded output [world_size, N]. At runtime, it exchanges IPC handles between ranks and uses CUDA VMM to create a contiguous virtual address space where all ranks' data is accessible, effectively making the distributed tensor appear as a single contiguous tensor.

Issue found:

evaluator.cpp:826 performs bounds check handle->tensor().size(1) without first verifying the tensor has at least 2 dimensions, which could cause out-of-bounds access

Confidence Score: 4/5

Safe to merge with one bounds check fix needed
The implementation is well-structured with proper caching, comprehensive testing, and clear separation of concerns. One logical error exists where dimension bounds aren't validated before accessing size(1), which could cause crashes with malformed tensors. The core VMM/IPC logic reuses existing SymmetricTensor infrastructure and follows established patterns in the codebase.
csrc/host_ir/evaluator.cpp needs bounds validation fix before dimension access

Important Files Changed

File Analysis

Filename	Score	Overview
csrc/host_ir/host_ir.cpp	5/5	implements constructor and string methods for `SymmetricContiguousView`, validates input has symmetric memory type
csrc/host_ir/evaluator.cpp	4/5	implements runtime evaluation of `SymmetricContiguousView`, creates contiguous view from cache and squeezes dimension, has dimension check issue
csrc/multidevice/ipc_handle.cpp	5/5	implements `SymMemForContiguousView` constructor with symmetric tensor setup and dimension squeezing, adds cache lookup for contiguous view

Sequence Diagram

sequenceDiagram
    participant User
    participant HostIrEvaluator
    participant SymMemHandleCache
    participant SymMemForContiguousView
    participant SymmetricTensor
    participant CUDA_VMM as CUDA VMM/IPC
    
    User->>HostIrEvaluator: handle(SymmetricContiguousView)
    HostIrEvaluator->>HostIrEvaluator: getKnownConcreteValue(in_tv)
    Note over HostIrEvaluator: Get sharded input tensor [1, N]
    
    HostIrEvaluator->>SymMemHandleCache: get({in_tensor, expr})
    
    alt Handle not in cache
        SymMemHandleCache->>SymMemForContiguousView: new SymMemForContiguousView(in_tensor, expr)
        SymMemForContiguousView->>SymmetricTensor: new SymmetricTensor(in_tensor)
        SymMemForContiguousView->>SymmetricTensor: setupContiguousView(tag)
        
        SymmetricTensor->>CUDA_VMM: cuMemAddressReserve(total_size)
        Note over SymmetricTensor,CUDA_VMM: Reserve contiguous virtual address space<br/>for all ranks
        
        loop For each rank
            SymmetricTensor->>CUDA_VMM: Exchange IPC handles
            SymmetricTensor->>CUDA_VMM: cuMemMap(region, getAllocHandle(rank))
            SymmetricTensor->>CUDA_VMM: cuMemSetAccess(region, READWRITE)
        end
        
        SymmetricTensor->>SymmetricTensor: Create tensor from mapped memory
        Note over SymmetricTensor: Shape: [world_size, ...local_shape]
        SymmetricTensor-->>SymMemForContiguousView: Return contiguous_view
        
        SymMemForContiguousView->>SymMemForContiguousView: squeeze(0) if size(0)==1
        SymMemForContiguousView-->>SymMemHandleCache: Return handle
        SymMemHandleCache->>SymMemHandleCache: Cache handle
    end
    
    SymMemHandleCache-->>HostIrEvaluator: Return cached handle
    HostIrEvaluator->>SymMemForContiguousView: tensor()
    SymMemForContiguousView-->>HostIrEvaluator: Return contiguous tensor [world_size, 1, N]
    HostIrEvaluator->>HostIrEvaluator: squeeze(1)
    Note over HostIrEvaluator: Final shape: [world_size, N]
    HostIrEvaluator->>HostIrEvaluator: bind(out_tv, contiguous_tensor)
    HostIrEvaluator-->>User: Return unsharded tensor

greptile-apps

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-13T17:53:21Z

csrc/host_ir/evaluator.cpp

+      handle->tensor().size(1) == 1,
+      "Contiguous view must have size 1 on sharded dimension");


logic: checking size(1) assumes tensor has at least 2 dimensions, but input has shape [1, N] where dim 0 is the sharded dimension. After getContiguousView() returns [world_size, 1, N], checking size(1) == 1 is correct, but should verify tensor has enough dimensions first

Suggested change

handle->tensor().size(1) == 1,

"Contiguous view must have size 1 on sharded dimension");

NVF_ERROR(

handle->tensor().dim() >= 2 && handle->tensor().size(1) == 1,

"Contiguous view must have at least 2 dimensions with size 1 on second dimension");

Code reference for using cuda VMM and multicast Driver API, that will be used in Symmetric Memory implementation. replaces the already approved #5385 - **Here ==> #5516** - #5517 - #5518 - #5519 - #5520 Full branch for reference: #5515

- #5516 - **Here ==> #5517** - #5518 - #5519 - #5520 Full branch for reference: #5515

samnordmann added 4 commits November 13, 2025 09:29

Symmetric Memory type

e76a490

Bcast and Allgather lowering and runtime support

7255c0f

lint

a9db7af

Symmetric Contiguous View

f1d3a5d

greptile-apps bot reviewed Nov 13, 2025

View reviewed changes

This was referenced Nov 13, 2025

[SymMem 1/5] VMM and multicast reference #5516

Merged

[SymMem 2/5] SymmetricTensor runtime type #5517

Merged

[SymMem 3/5] Add MemoryType::Symmetric #5518

Open

[SymMem 4/5] Broadcast and Allgather lowering + runtime support #5519

Open

samnordmann force-pushed the sym/bcast_allgather branch from a9db7af to 1e9afa3 Compare November 27, 2025 12:41

samnordmann added a commit that referenced this pull request Nov 27, 2025

[SymMem 2/5] SymmetricTensor runtime type (#5517)

569c115

- #5516 - **Here ==> #5517** - #5518 - #5519 - #5520 Full branch for reference: #5515

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SymMem 5/5] Contiguous View #5520

[SymMem 5/5] Contiguous View #5520

Uh oh!

samnordmann commented Nov 13, 2025 •

edited

Loading

Uh oh!

greptile-apps bot commented Nov 13, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		handle->tensor().size(1) == 1,
		"Contiguous view must have size 1 on sharded dimension");

-      handle->tensor().size(1) == 1,
-      "Contiguous view must have size 1 on sharded dimension");
+  NVF_ERROR(
+      handle->tensor().dim() >= 2 && handle->tensor().size(1) == 1,
+      "Contiguous view must have at least 2 dimensions with size 1 on second dimension");

[SymMem 5/5] Contiguous View #5520

Are you sure you want to change the base?

[SymMem 5/5] Contiguous View #5520

Uh oh!

Conversation

samnordmann commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Nov 13, 2025

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samnordmann commented Nov 13, 2025 •

edited

Loading