-
Notifications
You must be signed in to change notification settings - Fork 70
Factor Expr operand replacement in host IR lowering #5539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Performance Impact
cloneWithNewOperands and its usage in lowerSegment should be evaluated for performance impact, especially in scenarios where no replacements are made. |
| Expr* cloneWithNewOperands( | ||
| Expr* e, | ||
| const std::unordered_map<Val*, Val*>& replacement_map) { | ||
| auto maybe_replace = [&](Val*& x) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried to use OptOutMutator (#5538) but ran into
[ RUN ] StreamTest.Matmul
unknown file: Failure
C++ exception with description " INTERNAL ASSERT FAILED at /opt/pytorch/nvfuser/csrc/mutator.cpp:90, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues.
Expected !DependencyCheck::isDependencyOf(val, mutation) . Attempted to replace a val, T1_g_float[istreamIdx7{3}, iS11{i2}, iS8{( ceilDiv(i4, 3) )}], with a dependent val, T3_l_float[istreamIdx14{3}, iS12{i2}, iS15{( ceilDiv(i4, 3) )}] (T3_l_float[istreamIdx14{3}, iS12{i2}, iS15{( ceilDiv(i4, 3) )}]), which is not allowed as it would result in a recursive definition of T3_l_float[istreamIdx14{3}, iS12{i2}, iS15{( ceilDiv(i4, 3) )}]
Exception raised from registerMutation at /opt/pytorch/nvfuser/csrc/mutator.cpp:90 (most recent call first):
Let me know if there's an easier way.
Greptile OverviewGreptile SummaryRefactored host IR lowering to use a shared helper function Key changes:
Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant LowerSegment as lowerSegment
participant CloneHelper as cloneWithNewOperands
participant ReplacementMap as replacement_map
participant Expr as Expression
participant NewObjectFunc as newObjectFunc
LowerSegment->>LowerSegment: Build replacement_map<br/>(sharded TensorViews)
loop For each Expr in segment
LowerSegment->>CloneHelper: cloneWithNewOperands(e, replacement_map)
CloneHelper->>CloneHelper: Copy inputs vector
CloneHelper->>CloneHelper: Apply maybe_replace to inputs<br/>(count replacements)
CloneHelper->>CloneHelper: Copy outputs vector
CloneHelper->>CloneHelper: Apply maybe_replace to outputs<br/>(count replacements)
alt replaced > 0
CloneHelper->>NewObjectFunc: Create new Expr with<br/>replaced inputs/outputs
NewObjectFunc-->>CloneHelper: Return new Expr
CloneHelper-->>LowerSegment: Return new Expr
else replaced == 0
CloneHelper-->>LowerSegment: Return original Expr<br/>(avoid redundant IR node)
end
LowerSegment->>LowerSegment: Add Expr to loop body
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
|
!test |
Priya2698
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test here to track any unintended changes in behavior by later modifications?
What do you mean? This change doesn't affect behavior AFAIK. cc @Priya2698 |
The purpose of the PR is to not have redundant IR nodes, so we should have a different fusion IR before and after this PR. I am indicating that this behavior, that is, the updated fusion IR can be tested. Is this not the case? |
Got it -- thanks for clarifying! I updated the PR description. The purpose is to have a common helper that both SchedulerType::ExprEval and SchedulerType::Communication can call. Not having redundant IR nodes is a side benefit, and is an implementation detail that I'd rather not test (https://testing.googleblog.com/2013/08/testing-on-toilet-test-behavior-not.html) cc @Priya2698 |
Add cloneWithNewOperands so #5536 can reuse a single path for replacing Expr inputs/outputs based on a Val replacement map.