-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[MemCpyOpt] fix incorrect handling of lifetime markers #143782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Jameson Nash (vtjnash) ChangesHaving lifetime markers should only increase the information available to LLVM, but it was stopping the MemSSA walk and unable to see that these were extraneous. Refactored Secondly, it would ignore a lifetime marker that wasn't full size (relying on the callback to forbid the optimization entirely), but again sub-optimal lifetime markers are not supposed to be forbidding optimizations that would otherwise apply if they were either absent or optimal. This pass wasn't tracking GEP offsets either, so it wasn't quite correctly handled either (although earlier sub-optimal checks that this size is the same as the alloca test made this safe in the past). Lastly, the stack-move test seemed to have been providing lifetime in bits instead of bytes, although now we optimize it regardless of that, so we didn't really have to fix that. Full diff: https://github.com/llvm/llvm-project/pull/143782.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 960001bf880c6..fc439be40521a 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1366,56 +1366,65 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy,
/// Determine whether the pointer V had only undefined content (due to Def) up
/// to the given Size, either because it was freshly alloca'd or started its
-/// lifetime.
-static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
- MemoryDef *Def, Value *Size) {
- if (MSSA->isLiveOnEntryDef(Def))
- return isa<AllocaInst>(getUnderlyingObject(V));
-
- if (auto *II = dyn_cast_or_null<IntrinsicInst>(Def->getMemoryInst())) {
- if (II->getIntrinsicID() == Intrinsic::lifetime_start) {
- auto *LTSize = cast<ConstantInt>(II->getArgOperand(0));
-
- if (auto *CSize = dyn_cast<ConstantInt>(Size)) {
- if (AA.isMustAlias(V, II->getArgOperand(1)) &&
- LTSize->getZExtValue() >= CSize->getZExtValue())
- return true;
- }
+/// lifetime by walking the MSSA graph.
+static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA, Value *V,
+ MemoryAccess *Clobber, MemoryLocation Loc, Value *Size) {
+ while (1) {
+ Clobber = MSSA->getWalker()->getClobberingMemoryAccess(Clobber, Loc, BAA);
+ MemoryDef *Def = dyn_cast<MemoryDef>(Clobber);
+ if (!Def)
+ return false;
+
+ if (MSSA->isLiveOnEntryDef(Def))
+ return isa<AllocaInst>(getUnderlyingObject(V));
+
+ if (auto *II = dyn_cast_or_null<IntrinsicInst>(Def->getMemoryInst())) {
+ if (II->getIntrinsicID() == Intrinsic::lifetime_start) {
+ auto *LTSize = cast<ConstantInt>(II->getArgOperand(0));
- // If the lifetime.start covers a whole alloca (as it almost always
- // does) and we're querying a pointer based on that alloca, then we know
- // the memory is definitely undef, regardless of how exactly we alias.
- // The size also doesn't matter, as an out-of-bounds access would be UB.
- if (auto *Alloca = dyn_cast<AllocaInst>(getUnderlyingObject(V))) {
- if (getUnderlyingObject(II->getArgOperand(1)) == Alloca) {
- const DataLayout &DL = Alloca->getDataLayout();
- if (std::optional<TypeSize> AllocaSize =
- Alloca->getAllocationSize(DL))
- if (*AllocaSize == LTSize->getValue())
+ if (Size)
+ if (auto CSize = dyn_cast<ConstantInt>(Size))
+ if (BAA.isMustAlias(V, II->getArgOperand(1)) &&
+ LTSize->getZExtValue() >= CSize->getZExtValue())
return true;
+
+ // If the lifetime.start covers a whole alloca (as it almost always
+ // does) and we're querying a pointer based on that alloca, then we know
+ // the memory is definitely undef, regardless of how exactly we alias.
+ // The size also doesn't matter, as an out-of-bounds access would be UB.
+ if (auto *Alloca = dyn_cast<AllocaInst>(getUnderlyingObject(V))) {
+ if (getUnderlyingObject(II->getArgOperand(1)) == Alloca) {
+ const DataLayout &DL = Alloca->getDataLayout();
+ if (std::optional<TypeSize> AllocaSize =
+ Alloca->getAllocationSize(DL))
+ if (*AllocaSize == LTSize->getValue())
+ return true;
+ }
}
+ Clobber = Def->getDefiningAccess();
+ continue;
+ } else if (II->getIntrinsicID() == Intrinsic::lifetime_end) {
+ Clobber = Def->getDefiningAccess();
+ continue;
}
}
- }
- return false;
+ return false;
+ }
}
// If the memcpy is larger than the previous, but the memory was undef prior to
// that, we can just ignore the tail. Technically we're only interested in the
// bytes from 0..MemSrcOffset and MemSrcLength+MemSrcOffset..CopySize here, but
-// as we can't easily represent this location (hasUndefContents uses mustAlias
+// as we can't easily represent this location (hadUndefContentsBefore uses mustAlias
// which cannot deal with offsets), we use the full 0..CopySize range.
static bool overreadUndefContents(MemorySSA *MSSA, MemCpyInst *MemCpy,
MemIntrinsic *MemSrc, BatchAAResults &BAA) {
Value *CopySize = MemCpy->getLength();
- MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
- MemoryUseOrDef *MemSrcAccess = MSSA->getMemoryAccess(MemSrc);
- MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
- MemSrcAccess->getDefiningAccess(), MemCpyLoc, BAA);
- if (auto *MD = dyn_cast<MemoryDef>(Clobber))
- if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
- return true;
+ MemoryLocation LoadLoc = MemoryLocation::getForSource(MemCpy);
+ MemoryAccess *MemSrcAccess = MSSA->getMemoryAccess(MemSrc)->getDefiningAccess();
+ if (hadUndefContentsBefore(MSSA, BAA, MemCpy->getSource(), MemSrcAccess, LoadLoc, CopySize))
+ return true;
return false;
}
@@ -1573,11 +1582,14 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
// since both llvm.lifetime.start and llvm.lifetime.end intrinsics
// practically fill all the bytes of the alloca with an undefined
// value, although conceptually marked as alive/dead.
- int64_t Size = cast<ConstantInt>(UI->getOperand(0))->getSExtValue();
- if (Size < 0 || Size == DestSize) {
- LifetimeMarkers.push_back(UI);
- continue;
- }
+ // We don't currently track GEP offsets and sizes, so we don't have
+ // a way to check whether this lifetime marker affects the relevant
+ // memory regions.
+ // While we only really need to delete lifetime.end from Src and
+ // lifetime.begin from Dst, those are often implied by the memcpy
+ // anyways so hopefully not much is lost by removing all of them.
+ LifetimeMarkers.push_back(UI);
+ continue;
}
AAMetadataInstrs.insert(UI);
@@ -1594,9 +1606,8 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
return true;
};
- // Check that dest has no Mod/Ref, from the alloca to the Store, except full
- // size lifetime intrinsics. And collect modref inst for the reachability
- // check.
+ // Check that dest has no Mod/Ref, from the alloca to the Store. And collect
+ // modref inst for the reachability check.
ModRefInfo DestModRef = ModRefInfo::NoModRef;
MemoryLocation DestLoc(DestAlloca, LocationSize::precise(Size));
SmallVector<BasicBlock *, 8> ReachabilityWorklist;
@@ -1779,8 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
if (processMemSetMemCpyDependence(M, MDep, BAA))
return true;
+ MemoryLocation SrcLoc = MemoryLocation::getForSource(M);
MemoryAccess *SrcClobber = MSSA->getWalker()->getClobberingMemoryAccess(
- AnyClobber, MemoryLocation::getForSource(M), BAA);
+ AnyClobber, SrcLoc, BAA);
// There are five possible optimizations we can do for memcpy:
// a) memcpy-memcpy xform which exposes redundance for DSE.
@@ -1820,7 +1832,7 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
}
}
- if (hasUndefContents(MSSA, BAA, M->getSource(), MD, M->getLength())) {
+ if (hadUndefContentsBefore(MSSA, BAA, M->getSource(), AnyClobber, SrcLoc, M->getLength())) {
LLVM_DEBUG(dbgs() << "Removed memcpy from undef\n");
eraseInstruction(M);
++NumMemCpyInstr;
diff --git a/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll b/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll
index 989049ab67a0b..3b08e4404dde2 100644
--- a/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll
+++ b/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll
@@ -1,25 +1,27 @@
-; RUN: opt < %s -S -passes=memcpyopt | FileCheck --match-full-lines %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -S -passes=memcpyopt | FileCheck %s
; Alias scopes are merged by taking the intersection of domains, then the union of the scopes within those domains
define i8 @test(i8 %input) {
+; CHECK-LABEL: define i8 @test(
+; CHECK-SAME: i8 [[INPUT:%.*]]) {
+; CHECK-NEXT: [[SRC:%.*]] = alloca i8, align 1
+; CHECK-NEXT: store i8 [[INPUT]], ptr [[SRC]], align 1
+; CHECK-NEXT: [[RET_VALUE:%.*]] = load i8, ptr [[SRC]], align 1
+; CHECK-NEXT: ret i8 [[RET_VALUE]]
+;
%tmp = alloca i8
%dst = alloca i8
%src = alloca i8
-; CHECK: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope ![[SCOPE:[0-9]+]]
- call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %src), !noalias !4
+ call void @llvm.lifetime.start.p0(i64 1, ptr nonnull %src), !noalias !4
store i8 %input, ptr %src
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %tmp, ptr align 8 %src, i64 1, i1 false), !alias.scope !0
- call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !4
+ call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %src), !noalias !4
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %tmp, i64 1, i1 false), !alias.scope !4
%ret_value = load i8, ptr %dst
ret i8 %ret_value
}
-; Merged scope contains "callee0: %a" and "callee0 : %b"
-; CHECK-DAG: ![[CALLEE0_A:[0-9]+]] = distinct !{!{{[0-9]+}}, !{{[0-9]+}}, !"callee0: %a"}
-; CHECK-DAG: ![[CALLEE0_B:[0-9]+]] = distinct !{!{{[0-9]+}}, !{{[0-9]+}}, !"callee0: %b"}
-; CHECK-DAG: ![[SCOPE]] = !{![[CALLEE0_A]], ![[CALLEE0_B]]}
-
declare void @llvm.lifetime.start.p0(i64, ptr nocapture)
declare void @llvm.lifetime.end.p0(i64, ptr nocapture)
declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
diff --git a/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll b/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
index efdbdce401b76..fb5d675402eba 100644
--- a/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
+++ b/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
@@ -1,9 +1,10 @@
-; RUN: opt < %s -S -passes=memcpyopt | FileCheck --match-full-lines %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -S -passes=memcpyopt | FileCheck %s
; Make sure callslot optimization merges alias.scope metadata correctly when it merges instructions.
; Merging here naively generates:
; call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope !3
-; call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !0
+; call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %src), !noalias !0
; ...
; !0 = !{!1}
; !1 = distinct !{!1, !2, !"callee1: %a"}
@@ -13,15 +14,20 @@
; !5 = distinct !{!5, !"callee0"}
; Which is incorrect because the lifetime.end of %src will now "noalias" the above memcpy.
define i8 @test(i8 %input) {
+; CHECK-LABEL: define i8 @test(
+; CHECK-SAME: i8 [[INPUT:%.*]]) {
+; CHECK-NEXT: [[SRC:%.*]] = alloca i8, align 1
+; CHECK-NEXT: store i8 [[INPUT]], ptr [[SRC]], align 1
+; CHECK-NEXT: [[RET_VALUE:%.*]] = load i8, ptr [[SRC]], align 1
+; CHECK-NEXT: ret i8 [[RET_VALUE]]
+;
%tmp = alloca i8
%dst = alloca i8
%src = alloca i8
-; NOTE: we're matching the full line and looking for the lack of !alias.scope here
-; CHECK: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false)
- call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %src), !noalias !3
+ call void @llvm.lifetime.start.p0(i64 1, ptr nonnull %src), !noalias !3
store i8 %input, ptr %src
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %tmp, ptr align 8 %src, i64 1, i1 false), !alias.scope !0
- call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !3
+ call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %src), !noalias !3
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %tmp, i64 1, i1 false), !alias.scope !3
%ret_value = load i8, ptr %dst
ret i8 %ret_value
diff --git a/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll b/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll
index 2f1ce37ea2256..a463fed3ac58c 100644
--- a/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll
@@ -96,7 +96,6 @@ define void @test_lifetime_partial_alias_3(ptr noalias %dst) {
; CHECK-NEXT: [[A:%.*]] = alloca [16 x i8], align 1
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[A]])
; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[A]], i64 8
-; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[DST:%.*]], ptr [[GEP]], i64 4, i1 false)
; CHECK-NEXT: ret void
;
%a = alloca [16 x i8]
@@ -112,7 +111,6 @@ define void @test_lifetime_partial_alias_4(ptr noalias %dst) {
; CHECK-NEXT: [[A:%.*]] = alloca [16 x i8], align 1
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[A]])
; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[A]], i64 8
-; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[DST:%.*]], ptr [[GEP]], i64 8, i1 false)
; CHECK-NEXT: ret void
;
%a = alloca [16 x i8]
diff --git a/llvm/test/Transforms/MemCpyOpt/stack-move.ll b/llvm/test/Transforms/MemCpyOpt/stack-move.ll
index 4a75c5eea2499..31e255b83eb9e 100644
--- a/llvm/test/Transforms/MemCpyOpt/stack-move.ll
+++ b/llvm/test/Transforms/MemCpyOpt/stack-move.ll
@@ -1649,20 +1649,13 @@ loop_exit:
ret void
}
-; Tests that failure because partial-sized lifetimes are counted as mod.
+; Tests that partial-sized lifetimes are not inhibiting the optimizer
define void @partial_lifetime() {
; CHECK-LABEL: define void @partial_lifetime() {
-; CHECK-NEXT: [[SRC:%.*]] = alloca [[STRUCT_FOO:%.*]], align 4
-; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
-; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr captures(none) [[SRC]])
-; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 3, ptr captures(none) [[DEST]])
-; CHECK-NEXT: store [[STRUCT_FOO]] { i32 10, i32 20, i32 30 }, ptr [[SRC]], align 4
-; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr captures(none) [[SRC]])
-; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
-; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 3, ptr captures(none) [[SRC]])
+; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO:%.*]], align 4
+; CHECK-NEXT: store [[STRUCT_FOO]] { i32 10, i32 20, i32 30 }, ptr [[DEST]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr captures(none) [[DEST]])
; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr captures(none) [[DEST]])
-; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr captures(none) [[SRC]])
-; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr captures(none) [[DEST]])
; CHECK-NEXT: ret void
;
%src = alloca %struct.Foo, align 4
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
16189f1
to
a2f0414
Compare
This is a dependent commit of vtjnash@645f247, which removes the function's requirement of having this constant size be computed and available. |
As pointed out in #143782, these tests were specifying the size in bits instead of bytes. In order to preserve the intent of the tests, add a use of %src, which prevents stack-move optimization. These are supposed to test the handling of scoped alias metadata in call slot optimization.
As pointed out in llvm/llvm-project#143782, these tests were specifying the size in bits instead of bytes. In order to preserve the intent of the tests, add a use of %src, which prevents stack-move optimization. These are supposed to test the handling of scoped alias metadata in call slot optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, all lifetime markers should have an alloca argument and a size matching the alloca and be handled by the "whole alloca" code path that is independent of the passed size. In theory, lifetimes can be used in other ways, but this is the only practically relevant case. If I ever find the time, I'll drop the size argument from these intrinsics.
I do see some diffs from this change on dtcxzyw/llvm-opt-benchmark#2428, but I'm having a hard time understanding where they are actually coming from. None of the cases I looked at have mismatched lifetimes.
I think what actually happens is that your new implementation of hadUndefContentsBefore does not bail out if none of the cases match, which means that it will just continue the walk from the last found clobber -- which can only improve things if the clobber was the result of hitting the MSSA walk limit, rather than an actual clobber. So effectively, all this patch is really doing is add a bypass for the walk limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here break the intent of the test. I've adjusted it in a8c6fb4 to preserve the existing behavior with the lifetimes fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
@@ -1820,7 +1835,8 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) { | |||
} | |||
} | |||
|
|||
if (hasUndefContents(MSSA, BAA, M->getSource(), MD, M->getLength())) { | |||
if (hadUndefContentsBefore(MSSA, BAA, M->getSource(), AnyClobber, SrcLoc, | |||
M->getLength())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to recompute the clobbering access now, even though we already know it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I assumed it was inexpensive since it is a caching walker, but I could add a bool parameter if necessary to optimize that
continue; | ||
} else if (II->getIntrinsicID() == Intrinsic::lifetime_end) { | ||
Clobber = Def->getDefiningAccess(); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack of break below here is bypassing the MSSA clobber walk limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a return false
immediately below here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, there is. I was confused by the diff rendering. In that case I'm not sure what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your theory might be still correct about the proximal cause, it just only skips the limit specifically when the walk ended on a completely unrelated lifetime instruction. A local fix for that is:
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 877600034f84..759b16206d5c 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1370,6 +1370,7 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy,
static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA,
Value *V, MemoryAccess *Clobber,
MemoryLocation Loc, Value *Size) {
+ Value *VBase = getUnderlyingObject(V);
while (1) {
Clobber = MSSA->getWalker()->getClobberingMemoryAccess(Clobber, Loc, BAA);
MemoryDef *Def = dyn_cast<MemoryDef>(Clobber);
@@ -1377,12 +1378,18 @@ static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA,
return false;
if (MSSA->isLiveOnEntryDef(Def))
- return isa<AllocaInst>(getUnderlyingObject(V));
+ return isa<AllocaInst>(VBase);
if (auto *II = dyn_cast_or_null<IntrinsicInst>(Def->getMemoryInst())) {
if (II->getIntrinsicID() == Intrinsic::lifetime_start) {
auto *LTSize = cast<ConstantInt>(II->getArgOperand(0));
+ // Check if the SSA Walk ended early due to heuristics or actually
+ // reached a lifetime instruction for this pointer.
+ Value *IIBase = getUnderlyingObject(II->getArgOperand(1));
+ if (VBase != IIBase)
+ return false;
+
if (Size)
if (auto CSize = dyn_cast<ConstantInt>(Size))
if (BAA.isMustAlias(V, II->getArgOperand(1)) &&
@@ -1393,8 +1400,8 @@ static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA,
// does) and we're querying a pointer based on that alloca, then we know
// the memory is definitely undef, regardless of how exactly we alias.
// The size also doesn't matter, as an out-of-bounds access would be UB.
- if (auto *Alloca = dyn_cast<AllocaInst>(getUnderlyingObject(V))) {
- if (getUnderlyingObject(II->getArgOperand(1)) == Alloca) {
+ if (auto *Alloca = dyn_cast<AllocaInst>(VBase)) {
+ if (IIBase == Alloca) {
const DataLayout &DL = Alloca->getDataLayout();
if (std::optional<TypeSize> AllocaSize =
Alloca->getAllocationSize(DL))
@@ -1405,6 +1412,11 @@ static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA,
Clobber = Def->getDefiningAccess();
continue;
} else if (II->getIntrinsicID() == Intrinsic::lifetime_end) {
+ // Check if the SSA Walk ended early due to heuristics or actually
+ // reached a lifetime instruction for this pointer.
+ Value *IIBase = getUnderlyingObject(II->getArgOperand(1));
+ if (VBase != IIBase)
+ return false;
Clobber = Def->getDefiningAccess();
continue;
}
It looks like the getClobberingMemoryAccess code also could be easily adjusted to expose the walk limit (it already does, just not accessible from the header file)
But since it seems you're implying this function shouldn't gain any benefit from looking past the lifetime, I could also just drop this loop.
The tests however look better to me, so I also am wondering if there is some way to improve MemorySSA to handle simple cases like this, where there is a noalias base pointer (such as an alloca or malloc) with only a few uses in the dominator tree before the current MemoryDef location, none of which were capturing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests however look better to me, so I also am wondering if there is some way to improve MemorySSA to handle simple cases like this, where there is a noalias base pointer (such as an alloca or malloc) with only a few uses in the dominator tree before the current MemoryDef location, none of which were capturing?
At least for the basic case (memcpy from fully undef alloca), the simplest and most robust way to handle it might be to do this in InstCombine. We already have this transform there:
static bool isAllocSiteRemovable(Instruction *AI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read the tests closely, so I didn't realize those were often reads from undef it was removing. I thought that might have been GVN's job to detect a memcpy from undef contents and remove it?
I see at least one place there that mentions it shouldn't fail to optimize when encountering lifetime there:
llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
Lines 4916 to 4925 in ce747a1
while (!AllocaUsers.empty()) { | |
auto *UserI = cast<Instruction>(AllocaUsers.pop_back_val()); | |
if (isa<GetElementPtrInst>(UserI) || isa<AddrSpaceCastInst>(UserI)) { | |
pushUsers(*UserI); | |
continue; | |
} | |
if (UserI == CB) | |
continue; | |
// TODO: support lifetime.start/end here | |
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've trimmed this PR down to just the part I needed for vtjnash@645f247, pending separate discussion of whether it is even worthwhile optimizing those tests for which lifetime.start does not cover the whole alloca.
Additionally, I made a separate PR as you suggested (#143958) to handle those simple cases in InstCombine instead, and to see how that affects the benchmarks.
Having lifetime markers should only increase the information available to LLVM, but it would instead rely on the callback to entirely give up if it encountered a lifetime marker that wasn't full size, but sub-optimal lifetime markers are not supposed to be forbidding optimizations that would otherwise apply if they were either absent or optimal. This pass wasn't tracking GEP offsets either, so it wasn't quite correctly handled either, although earlier sub-optimal checks that this size is the same as the alloca test made this safe in the past, and unlikely to have encountered anything else in the past.
a2f0414
to
015e077
Compare
As pointed out in llvm#143782, these tests were specifying the size in bits instead of bytes. In order to preserve the intent of the tests, add a use of %src, which prevents stack-move optimization. These are supposed to test the handling of scoped alias metadata in call slot optimization.
Having lifetime markers should only increase the information available
to LLVM, but it would instead rely on the callback to entirely give up
if it encountered a lifetime marker that wasn't full size, but
sub-optimal lifetime markers are not supposed to be forbidding
optimizations that would otherwise apply if they were either absent or
optimal. This pass wasn't tracking GEP offsets either, so it wasn't
quite correctly handled either, although earlier sub-optimal checks
that this size is the same as the alloca test made this safe in the
past, and unlikely to have encountered anything else in the past.
Having lifetime markers should only increase the information available to LLVM, but it was stopping the MemSSA walk and unable to see that these were extraneous. RefactoredhadUndefContentsBefore
to try to DRY the code slightly, so that we are doing that walk inside it, instead of in every caller.Secondly, it would ignore a lifetime marker that wasn't full size (relying on the callback to forbid the optimization entirely), but again sub-optimal lifetime markers are not supposed to be forbidding optimizations that would otherwise apply if they were either absent or optimal. This pass wasn't tracking GEP offsets either, so it wasn't quite correctly handled either (although earlier sub-optimal checks that this size is the same as the alloca test made this safe in the past).Lastly, the stack-move test seemed to have been providing lifetime in bits instead of bytes, although now we optimize it regardless of that, so we didn't really have to fix that.