Skip to content

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Nov 15, 2025

Add cloneWithNewOperands so #5536 can reuse a single path for replacing Expr inputs/outputs based on a Val replacement map.

@github-actions
Copy link

Description

  • Added cloneWithNewOperands to handle Expr operand replacement

  • Utilized cloneWithNewOperands in lowerSegment to simplify logic

  • Avoided redundant IR node creation when operands unchanged

Changes walkthrough

Relevant files
Enhancement
lowering.cpp
Refactor operand replacement with helper function               

csrc/host_ir/lowering.cpp

  • Introduced cloneWithNewOperands function
  • Replaced manual operand replacement in lowerSegment with
    cloneWithNewOperands
  • Optimized to skip reconstructing Exprs with unchanged operands
  • +31/-18 

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review
    Performance Impact

    The new function cloneWithNewOperands and its usage in lowerSegment should be evaluated for performance impact, especially in scenarios where no replacements are made.

    // Returns a new Expr with the inputs and outputs replaced by the replacement
    // map. If none of the inputs or outputs are replaced, returns the original
    // Expr.
    Expr* cloneWithNewOperands(
        Expr* e,
        const std::unordered_map<Val*, Val*>& replacement_map) {
      auto maybe_replace = [&](Val*& x) -> bool {
        Val* new_x = getOrDefault(replacement_map, x);
        if (new_x == nullptr) {
          return false;
        }
        x = new_x;
        return true;
      };
    
      int64_t replaced = 0;
    
      std::vector<Val*> new_ins = e->inputs();
      replaced += std::ranges::count_if(new_ins, maybe_replace);
    
      std::vector<Val*> new_outs = e->outputs();
      replaced += std::ranges::count_if(new_outs, maybe_replace);
    
      if (replaced == 0) {
        return e;
      }
    
      return e->newObjectFunc()(e->container(), new_ins, new_outs, e->attributes());
    }
    Code Clarity

    The use of std::ranges::count_if and lambda functions might reduce code readability. Consider adding comments or breaking down the logic into smaller, more understandable functions.

    replaced += std::ranges::count_if(new_ins, maybe_replace);
    
    std::vector<Val*> new_outs = e->outputs();
    replaced += std::ranges::count_if(new_outs, maybe_replace);
    
    Error Handling

    Ensure that e->newObjectFunc() handles all possible cases and does not throw exceptions or return null, which could lead to undefined behavior.

      return e->newObjectFunc()(e->container(), new_ins, new_outs, e->attributes());
    }

    Expr* cloneWithNewOperands(
    Expr* e,
    const std::unordered_map<Val*, Val*>& replacement_map) {
    auto maybe_replace = [&](Val*& x) -> bool {
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I also tried to use OptOutMutator (#5538) but ran into

    [ RUN      ] StreamTest.Matmul
    unknown file: Failure
    C++ exception with description " INTERNAL ASSERT FAILED at /opt/pytorch/nvfuser/csrc/mutator.cpp:90, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. 
    Expected !DependencyCheck::isDependencyOf(val, mutation) . Attempted to replace a val, T1_g_float[istreamIdx7{3}, iS11{i2}, iS8{( ceilDiv(i4, 3) )}], with a dependent val, T3_l_float[istreamIdx14{3}, iS12{i2}, iS15{( ceilDiv(i4, 3) )}] (T3_l_float[istreamIdx14{3}, iS12{i2}, iS15{( ceilDiv(i4, 3) )}]), which is not allowed as it would result in a recursive definition of T3_l_float[istreamIdx14{3}, iS12{i2}, iS15{( ceilDiv(i4, 3) )}]
    Exception raised from registerMutation at /opt/pytorch/nvfuser/csrc/mutator.cpp:90 (most recent call first):
    

    Let me know if there's an easier way.

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Nov 15, 2025

    Greptile Overview

    Greptile Summary

    Refactored host IR lowering to use a shared helper function cloneWithNewOperands for replacing Expr operands based on a replacement map. The new implementation avoids creating redundant IR nodes when no replacements occur, improving efficiency in non-sharded cases.

    Key changes:

    • Added cloneWithNewOperands helper that takes an Expr and replacement map, returning either a new Expr with replaced operands or the original Expr if no replacements occurred
    • Replaced inline transformation logic in the ExprEval case with a call to the new helper
    • Optimization: skips calling newObjectFunc when replaced == 0, preventing unnecessary IR node creation

    Confidence Score: 5/5

    • This PR is safe to merge - it's a clean refactoring that consolidates duplicate logic without changing behavior
    • The refactoring extracts common operand replacement logic into a reusable helper function with an optimization to avoid redundant IR nodes. The logic is equivalent to the original implementation (verified by analyzing both code paths), and the change is well-contained within a single file. The optimization of returning the original Expr when no replacements occur is correct since the input Expr is already a clone from ir_cloner.
    • No files require special attention

    Important Files Changed

    File Analysis

    Filename Score Overview
    csrc/host_ir/lowering.cpp 5/5 Refactored Expr operand replacement logic into a reusable helper function that avoids redundant IR node creation when no sharding occurs

    Sequence Diagram

    sequenceDiagram
        participant LowerSegment as lowerSegment
        participant CloneHelper as cloneWithNewOperands
        participant ReplacementMap as replacement_map
        participant Expr as Expression
        participant NewObjectFunc as newObjectFunc
    
        LowerSegment->>LowerSegment: Build replacement_map<br/>(sharded TensorViews)
        loop For each Expr in segment
            LowerSegment->>CloneHelper: cloneWithNewOperands(e, replacement_map)
            CloneHelper->>CloneHelper: Copy inputs vector
            CloneHelper->>CloneHelper: Apply maybe_replace to inputs<br/>(count replacements)
            CloneHelper->>CloneHelper: Copy outputs vector
            CloneHelper->>CloneHelper: Apply maybe_replace to outputs<br/>(count replacements)
            alt replaced > 0
                CloneHelper->>NewObjectFunc: Create new Expr with<br/>replaced inputs/outputs
                NewObjectFunc-->>CloneHelper: Return new Expr
                CloneHelper-->>LowerSegment: Return new Expr
            else replaced == 0
                CloneHelper-->>LowerSegment: Return original Expr<br/>(avoid redundant IR node)
            end
            LowerSegment->>LowerSegment: Add Expr to loop body
        end
    
    Loading

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1 file reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    @wujingyue
    Copy link
    Collaborator Author

    !test

    Copy link
    Collaborator

    @Priya2698 Priya2698 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Can we add a test here to track any unintended changes in behavior by later modifications?

    @wujingyue
    Copy link
    Collaborator Author

    Can we add a test here to track any unintended changes in behavior by later modifications?

    What do you mean? This change doesn't affect behavior AFAIK. cc @Priya2698

    @Priya2698
    Copy link
    Collaborator

    Can we add a test here to track any unintended changes in behavior by later modifications?

    What do you mean? This change doesn't affect behavior AFAIK. cc @Priya2698

    The purpose of the PR is to not have redundant IR nodes, so we should have a different fusion IR before and after this PR. I am indicating that this behavior, that is, the updated fusion IR can be tested. Is this not the case?

    @wujingyue
    Copy link
    Collaborator Author

    The purpose of the PR is to not have redundant IR nodes, so we should have a different fusion IR before and after this PR.

    Got it -- thanks for clarifying! I updated the PR description. The purpose is to have a common helper that both SchedulerType::ExprEval and SchedulerType::Communication can call.

    Not having redundant IR nodes is a side benefit, and is an implementation detail that I'd rather not test (https://testing.googleblog.com/2013/08/testing-on-toilet-test-behavior-not.html)

    cc @Priya2698

    @wujingyue wujingyue merged commit 2c0687b into main Nov 17, 2025
    60 checks passed
    @wujingyue wujingyue deleted the wjy/replace branch November 17, 2025 16:10
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants