Skip to content

Commit

Permalink
[DO NOT SUBMIT] Testing multi-use fusion + collapse
Browse files Browse the repository at this point in the history
The tweak to collapse dims prevents a compilation timeout, but it has
horrible effects on the runtime performance. When there are multiple
reduction ops and it goes down warp reduction, the dispatch has to be
in a very specific state to have good results. Otherwise, compilation
times out or the compiled dispatch is VERY slow (3x total sdxl runtime).

See: iree-org#19868

I found that there are a few sdxl instances of

1 = op with multiple uses
2 = consumer of "1" (transpose)
3 = consumer of "2" (bit extend)

However, there is a reshape that will get stuck between 1-2 or 2-3
depending on which pass you look at (maybe always 2-3). 1-2 could be fused with
multi-use fusion.

Signed-off-by: Ian Wood <[email protected]>
  • Loading branch information
IanWood1 committed Feb 6, 2025
1 parent e7473ff commit d1905f8
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -201,21 +201,7 @@ static bool isEligibleForCollapse(Operation *op) {
return false;
}

// TODO(#17948) GPU codegen fails when we collapse the dimensions of softmax.
auto isPossiblySoftmax = [&](OpOperand *operand) -> bool {
auto genericOperand = operand->get().getDefiningOp<linalg::GenericOp>();
if (!genericOperand) {
return false;
}

if (genericOperand.getNumReductionLoops() == 0) {
return false;
}

auto map = genericOp.getMatchingIndexingMap(operand);
return !map.isPermutation() && map.isProjectedPermutation();
};
if (llvm::any_of(genericOp.getDpsInputOperands(), isPossiblySoftmax)) {
if (genericOp.getNumReductionLoops() > 1) {
return false;
}

Expand Down
4 changes: 4 additions & 0 deletions compiler/src/iree/compiler/DispatchCreation/Passes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,10 @@ void addDispatchRegionCreationPreprocessingPasses(OpPassManager &passManager) {
.addPass(IREE::Flow::createCanonicalizerPass)
.addPass(mlir::createCSEPass)

.addPass(DispatchCreation::createFuseMultiUseElementwiseProducerPass)
.addPass(IREE::Flow::createCanonicalizerPass)
.addPass(mlir::createCSEPass)

// 4. After elementwise operation fusion sink reshapes that block
// producer-consumer fusion.
.addPass(DispatchCreation::createSinkReshapesPass)
Expand Down

0 comments on commit d1905f8

Please sign in to comment.