Skip to content

Conversation

@samnordmann
Copy link
Collaborator

@samnordmann samnordmann commented Nov 13, 2025

@samnordmann samnordmann changed the base branch from main to sym/symmetric_tensor November 13, 2025 17:48
@samnordmann samnordmann marked this pull request as ready for review November 13, 2025 17:48
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 13, 2025

Greptile Overview

Greptile Summary

This PR adds MemoryType::Symmetric to support symmetric memory allocations, which enable efficient multi-device operations through CUDA VMM and NVLS multicast. The implementation adds the new memory type enum, preserves it during fusion input/output setup and host IR lowering, and implements allocation logic using SymmetricTensor::allocate() with a contiguity requirement.

Key changes:

  • Added MemoryType::Symmetric enum value and string representation
  • Modified Fusion::addInput, addOutput, and host IR lowering to preserve symmetric memory type instead of forcing global
  • Added symmetric allocation path in HostIrEvaluator::handle(kir::Allocate) with contiguity validation
  • Updated allocation sites to use tv->getMemoryType() instead of hardcoding MemoryType::Global
  • Added test case validating symmetric memory allocation and properties

Issue found:

  • Fusion::replaceOutput still forces MemoryType::Global (acknowledged by TODO comment)

Confidence Score: 4/5

  • Safe to merge with one known gap in replaceOutput (documented in TODO)
  • The implementation is well-structured and consistent across most of the codebase. However, Fusion::replaceOutput doesn't preserve symmetric memory type, which could cause issues if a symmetric tensor is replaced. The TODO comment acknowledges this gap, suggesting it's a known limitation being tracked for future work.
  • csrc/fusion.cpp - replaceOutput needs symmetric memory support

Important Files Changed

File Analysis

Filename Score Overview
csrc/fusion.cpp 4/5 Preserved symmetric memory type when adding inputs/outputs, but replaceOutput still forces global memory
csrc/host_ir/lowering.cpp 5/5 Changed allocations to respect the tensor's existing memory type rather than forcing global
csrc/host_ir/evaluator.cpp 5/5 Added symmetric memory allocation path using SymmetricTensor::allocate with contiguity check

Sequence Diagram

sequenceDiagram
    participant User
    participant Fusion
    participant TensorView
    participant HostIrLower
    participant HostIrEvaluator
    participant SymmetricTensor
    
    User->>Fusion: addInput(tv) / addOutput(tv)
    Fusion->>TensorView: getMemoryType()
    alt MemoryType != Symmetric
        Fusion->>TensorView: setMemoryType(Global)
    end
    
    User->>HostIrLower: lower(Fusion)
    HostIrLower->>TensorView: getMemoryType()
    alt MemoryType != Symmetric
        HostIrLower->>TensorView: setMemoryType(Global)
    end
    HostIrLower->>HostIrLower: Create Allocate HIR nodes
    Note over HostIrLower: Uses tv->getMemoryType()
    
    User->>HostIrEvaluator: runWithInput()
    HostIrEvaluator->>HostIrEvaluator: handle(kir::Allocate)
    HostIrEvaluator->>TensorView: getMemoryType()
    
    alt MemoryType == Symmetric
        HostIrEvaluator->>HostIrEvaluator: isTvContiguous(tv)
        Note over HostIrEvaluator: Error if not contiguous
        HostIrEvaluator->>SymmetricTensor: allocate(sizes, dtype, device)
        SymmetricTensor->>SymmetricTensor: Check VMM support
        SymmetricTensor->>SymmetricTensor: cuMemCreate with granularity
        SymmetricTensor->>SymmetricTensor: cuMemAddressReserve & cuMemMap
        SymmetricTensor-->>HostIrEvaluator: at::Tensor
    else MemoryType != Symmetric
        HostIrEvaluator->>HostIrEvaluator: empty_strided_cuda()
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +329 to +330
MemoryType::Global); // TODO: may need a patch to support symmetric
// memory type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The TODO is valid - replaceOutput should preserve symmetric memory like addInput and addOutput do

Suggested change
MemoryType::Global); // TODO: may need a patch to support symmetric
// memory type
if (tv->getMemoryType() != MemoryType::Symmetric) {
tv->setMemoryType(MemoryType::Global);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants