Skip to content

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Nov 15, 2025

Fixes #5307

@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Nov 15, 2025

Review updated until commit b859b07

Description

  • Added outputIsPreallocated checks in HostIrEvaluator.

  • Enhanced cloneWithNewOperands to handle preallocated outputs.

  • Updated lowerSegment to manage preallocated outputs correctly.

  • Introduced withOutputPreallocated method in Expr.

Changes walkthrough

Relevant files
Enhancement
5 files
evaluator.cpp
Added checks for preallocated outputs in `handle` methods.
+10/-10 
lowering.cpp
Updated cloneWithNewOperands and lowerSegment for preallocated
outputs.
+78/-31 
stream_parallel_type.cpp
Marked expressions with preallocated outputs.                       
+1/-0     
base_nodes.cpp
Added `withOutputPreallocated` and `outputIsPreallocated` methods.
+8/-1     
base_nodes.h
Added outputIsPreallocated and withOutputPreallocated declarations.
+8/-0     
Tests
3 files
test_host_ir_evaluator.cpp
Marked `MatmulOp` with preallocated output in tests.         
+2/-1     
test_host_irs.cpp
Marked `MatmulOp` and `LinearOp` with preallocated output in tests.
+4/-2     
test_overlap.py
Added test for row-parallel linear forward with preallocated outputs.
+43/-0   
Miscellaneous
1 files
communicator.h
Changed return types of `size` and `local_size` methods. 
+3/-3     
Configuration changes
1 files
CMakeLists.txt
Reordered test sources in `MULTIDEVICE_TEST_SRCS`.             
+2/-2     
Additional files
3 files
fusion_kernel_runtime.cpp +2/-2     
test_host_ir_stream_lowering.cpp +0/-3     
test_multidevice_stream_parallel_type.cpp +0/-3     

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Possible Issue

The unhandled(matmul) call in handle(MatmulOp* matmul) is moved inside the if block, which might lead to unhandled cases where matmul->outputIsPreallocated() is true. Ensure that all cases are correctly handled.

if (!matmul->outputIsPreallocated()) {
  unhandled(matmul);
  return;
Code Cleanup

The FIXME comments in lowerSegment function suggest potential cleanup and improvements. Address these comments to improve code quality.

LoopInfo innermost;
if (!loop_nest.empty()) {
  innermost = loop_nest.innermost();
}
Code Cleanup

The FIXME comment in the Expr::Expr(const Expr* src, IrCloner* ir_cloner) constructor suggests that the output_is_preallocated_ flag should be copied. Ensure that this flag is correctly handled during cloning.

std::vector<Statement*> attributes)

@wujingyue
Copy link
Collaborator Author

!test

@wujingyue wujingyue force-pushed the wjy/comm branch 2 times, most recently from c5adf62 to e56aeea Compare November 16, 2025 16:28
wujingyue added a commit that referenced this pull request Nov 17, 2025
Add cloneWithNewOperands so #5536 can reuse a single path for replacing
Expr inputs/outputs based on a Val replacement map.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lower stream-parallelized matmul+allreduce to a host IR loop

2 participants