[MemCpyOpt] fix incorrect handling of lifetime markers #143782

vtjnash · 2025-06-11T20:44:36Z

Having lifetime markers should only increase the information available
to LLVM, but it would instead rely on the callback to entirely give up
if it encountered a lifetime marker that wasn't full size, but
sub-optimal lifetime markers are not supposed to be forbidding
optimizations that would otherwise apply if they were either absent or
optimal. This pass wasn't tracking GEP offsets either, so it wasn't
quite correctly handled either, although earlier sub-optimal checks
that this size is the same as the alloca test made this safe in the
past, and unlikely to have encountered anything else in the past.

Having lifetime markers should only increase the information available to LLVM, but it was stopping the MemSSA walk and unable to see that these were extraneous. Refactored hadUndefContentsBefore to try to DRY the code slightly, so that we are doing that walk inside it, instead of in every caller.

Secondly, it would ignore a lifetime marker that wasn't full size (relying on the callback to forbid the optimization entirely), but again sub-optimal lifetime markers are not supposed to be forbidding optimizations that would otherwise apply if they were either absent or optimal. This pass wasn't tracking GEP offsets either, so it wasn't quite correctly handled either (although earlier sub-optimal checks that this size is the same as the alloca test made this safe in the past).

~~Lastly, the stack-move test seemed to have been providing lifetime in bits instead of bytes, although now we optimize it regardless of that, so we didn't really have to fix that.~~

llvmbot · 2025-06-11T20:45:06Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Jameson Nash (vtjnash)

Changes

Having lifetime markers should only increase the information available to LLVM, but it was stopping the MemSSA walk and unable to see that these were extraneous. Refactored hadUndefContentsBefore to try to DRY the code slightly, so that we are doing that walk inside it, instead of in every caller.

Secondly, it would ignore a lifetime marker that wasn't full size (relying on the callback to forbid the optimization entirely), but again sub-optimal lifetime markers are not supposed to be forbidding optimizations that would otherwise apply if they were either absent or optimal. This pass wasn't tracking GEP offsets either, so it wasn't quite correctly handled either (although earlier sub-optimal checks that this size is the same as the alloca test made this safe in the past).

Lastly, the stack-move test seemed to have been providing lifetime in bits instead of bytes, although now we optimize it regardless of that, so we didn't really have to fix that.

Full diff: https://github.com/llvm/llvm-project/pull/143782.diff

5 Files Affected:

(modified) llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp (+57-45)
(modified) llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll (+11-9)
(modified) llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll (+12-6)
(modified) llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll (-2)
(modified) llvm/test/Transforms/MemCpyOpt/stack-move.ll (+4-11)

diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 960001bf880c6..fc439be40521a 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -1366,56 +1366,65 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy,
 
 /// Determine whether the pointer V had only undefined content (due to Def) up
 /// to the given Size, either because it was freshly alloca'd or started its
-/// lifetime.
-static bool hasUndefContents(MemorySSA *MSSA, BatchAAResults &AA, Value *V,
-                             MemoryDef *Def, Value *Size) {
-  if (MSSA->isLiveOnEntryDef(Def))
-    return isa<AllocaInst>(getUnderlyingObject(V));
-
-  if (auto *II = dyn_cast_or_null<IntrinsicInst>(Def->getMemoryInst())) {
-    if (II->getIntrinsicID() == Intrinsic::lifetime_start) {
-      auto *LTSize = cast<ConstantInt>(II->getArgOperand(0));
-
-      if (auto *CSize = dyn_cast<ConstantInt>(Size)) {
-        if (AA.isMustAlias(V, II->getArgOperand(1)) &&
-            LTSize->getZExtValue() >= CSize->getZExtValue())
-          return true;
-      }
+/// lifetime by walking the MSSA graph.
+static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA, Value *V,
+                                   MemoryAccess *Clobber, MemoryLocation Loc, Value *Size) {
+  while (1) {
+    Clobber = MSSA->getWalker()->getClobberingMemoryAccess(Clobber, Loc, BAA);
+    MemoryDef *Def = dyn_cast<MemoryDef>(Clobber);
+    if (!Def)
+      return false;
+
+    if (MSSA->isLiveOnEntryDef(Def))
+      return isa<AllocaInst>(getUnderlyingObject(V));
+
+    if (auto *II = dyn_cast_or_null<IntrinsicInst>(Def->getMemoryInst())) {
+      if (II->getIntrinsicID() == Intrinsic::lifetime_start) {
+        auto *LTSize = cast<ConstantInt>(II->getArgOperand(0));
 
-      // If the lifetime.start covers a whole alloca (as it almost always
-      // does) and we're querying a pointer based on that alloca, then we know
-      // the memory is definitely undef, regardless of how exactly we alias.
-      // The size also doesn't matter, as an out-of-bounds access would be UB.
-      if (auto *Alloca = dyn_cast<AllocaInst>(getUnderlyingObject(V))) {
-        if (getUnderlyingObject(II->getArgOperand(1)) == Alloca) {
-          const DataLayout &DL = Alloca->getDataLayout();
-          if (std::optional<TypeSize> AllocaSize =
-                  Alloca->getAllocationSize(DL))
-            if (*AllocaSize == LTSize->getValue())
+        if (Size)
+          if (auto CSize = dyn_cast<ConstantInt>(Size))
+            if (BAA.isMustAlias(V, II->getArgOperand(1)) &&
+                LTSize->getZExtValue() >= CSize->getZExtValue())
               return true;
+
+        // If the lifetime.start covers a whole alloca (as it almost always
+        // does) and we're querying a pointer based on that alloca, then we know
+        // the memory is definitely undef, regardless of how exactly we alias.
+        // The size also doesn't matter, as an out-of-bounds access would be UB.
+        if (auto *Alloca = dyn_cast<AllocaInst>(getUnderlyingObject(V))) {
+          if (getUnderlyingObject(II->getArgOperand(1)) == Alloca) {
+            const DataLayout &DL = Alloca->getDataLayout();
+            if (std::optional<TypeSize> AllocaSize =
+                    Alloca->getAllocationSize(DL))
+              if (*AllocaSize == LTSize->getValue())
+                return true;
+          }
         }
+        Clobber = Def->getDefiningAccess();
+        continue;
+      } else if (II->getIntrinsicID() == Intrinsic::lifetime_end) {
+        Clobber = Def->getDefiningAccess();
+        continue;
       }
     }
-  }
 
-  return false;
+    return false;
+  }
 }
 
 // If the memcpy is larger than the previous, but the memory was undef prior to
 // that, we can just ignore the tail. Technically we're only interested in the
 // bytes from 0..MemSrcOffset and MemSrcLength+MemSrcOffset..CopySize here, but
-// as we can't easily represent this location (hasUndefContents uses mustAlias
+// as we can't easily represent this location (hadUndefContentsBefore uses mustAlias
 // which cannot deal with offsets), we use the full 0..CopySize range.
 static bool overreadUndefContents(MemorySSA *MSSA, MemCpyInst *MemCpy,
                                   MemIntrinsic *MemSrc, BatchAAResults &BAA) {
   Value *CopySize = MemCpy->getLength();
-  MemoryLocation MemCpyLoc = MemoryLocation::getForSource(MemCpy);
-  MemoryUseOrDef *MemSrcAccess = MSSA->getMemoryAccess(MemSrc);
-  MemoryAccess *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(
-      MemSrcAccess->getDefiningAccess(), MemCpyLoc, BAA);
-  if (auto *MD = dyn_cast<MemoryDef>(Clobber))
-    if (hasUndefContents(MSSA, BAA, MemCpy->getSource(), MD, CopySize))
-      return true;
+  MemoryLocation LoadLoc = MemoryLocation::getForSource(MemCpy);
+  MemoryAccess *MemSrcAccess = MSSA->getMemoryAccess(MemSrc)->getDefiningAccess();
+  if (hadUndefContentsBefore(MSSA, BAA, MemCpy->getSource(), MemSrcAccess, LoadLoc, CopySize))
+    return true;
   return false;
 }
 
@@ -1573,11 +1582,14 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
             // since both llvm.lifetime.start and llvm.lifetime.end intrinsics
             // practically fill all the bytes of the alloca with an undefined
             // value, although conceptually marked as alive/dead.
-            int64_t Size = cast<ConstantInt>(UI->getOperand(0))->getSExtValue();
-            if (Size < 0 || Size == DestSize) {
-              LifetimeMarkers.push_back(UI);
-              continue;
-            }
+            // We don't currently track GEP offsets and sizes, so we don't have
+            // a way to check whether this lifetime marker affects the relevant
+            // memory regions.
+            // While we only really need to delete lifetime.end from Src and
+            // lifetime.begin from Dst, those are often implied by the memcpy
+            // anyways so hopefully not much is lost by removing all of them.
+            LifetimeMarkers.push_back(UI);
+            continue;
           }
           AAMetadataInstrs.insert(UI);
 
@@ -1594,9 +1606,8 @@ bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,
     return true;
   };
 
-  // Check that dest has no Mod/Ref, from the alloca to the Store, except full
-  // size lifetime intrinsics. And collect modref inst for the reachability
-  // check.
+  // Check that dest has no Mod/Ref, from the alloca to the Store. And collect
+  // modref inst for the reachability check.
   ModRefInfo DestModRef = ModRefInfo::NoModRef;
   MemoryLocation DestLoc(DestAlloca, LocationSize::precise(Size));
   SmallVector<BasicBlock *, 8> ReachabilityWorklist;
@@ -1779,8 +1790,9 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
         if (processMemSetMemCpyDependence(M, MDep, BAA))
           return true;
 
+  MemoryLocation SrcLoc = MemoryLocation::getForSource(M);
   MemoryAccess *SrcClobber = MSSA->getWalker()->getClobberingMemoryAccess(
-      AnyClobber, MemoryLocation::getForSource(M), BAA);
+      AnyClobber, SrcLoc, BAA);
 
   // There are five possible optimizations we can do for memcpy:
   //   a) memcpy-memcpy xform which exposes redundance for DSE.
@@ -1820,7 +1832,7 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
       }
     }
 
-    if (hasUndefContents(MSSA, BAA, M->getSource(), MD, M->getLength())) {
+    if (hadUndefContentsBefore(MSSA, BAA, M->getSource(), AnyClobber, SrcLoc, M->getLength())) {
       LLVM_DEBUG(dbgs() << "Removed memcpy from undef\n");
       eraseInstruction(M);
       ++NumMemCpyInstr;
diff --git a/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll b/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll
index 989049ab67a0b..3b08e4404dde2 100644
--- a/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll
+++ b/llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll
@@ -1,25 +1,27 @@
-; RUN: opt < %s -S -passes=memcpyopt | FileCheck --match-full-lines %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -S -passes=memcpyopt | FileCheck %s
 
 ; Alias scopes are merged by taking the intersection of domains, then the union of the scopes within those domains
 define i8 @test(i8 %input) {
+; CHECK-LABEL: define i8 @test(
+; CHECK-SAME: i8 [[INPUT:%.*]]) {
+; CHECK-NEXT:    [[SRC:%.*]] = alloca i8, align 1
+; CHECK-NEXT:    store i8 [[INPUT]], ptr [[SRC]], align 1
+; CHECK-NEXT:    [[RET_VALUE:%.*]] = load i8, ptr [[SRC]], align 1
+; CHECK-NEXT:    ret i8 [[RET_VALUE]]
+;
   %tmp = alloca i8
   %dst = alloca i8
   %src = alloca i8
-; CHECK:   call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope ![[SCOPE:[0-9]+]]
-  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %src), !noalias !4
+  call void @llvm.lifetime.start.p0(i64 1, ptr nonnull %src), !noalias !4
   store i8 %input, ptr %src
   call void @llvm.memcpy.p0.p0.i64(ptr align 8 %tmp, ptr align 8 %src, i64 1, i1 false), !alias.scope !0
-  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !4
+  call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %src), !noalias !4
   call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %tmp, i64 1, i1 false), !alias.scope !4
   %ret_value = load i8, ptr %dst
   ret i8 %ret_value
 }
 
-; Merged scope contains "callee0: %a" and "callee0 : %b"
-; CHECK-DAG: ![[CALLEE0_A:[0-9]+]] = distinct !{!{{[0-9]+}}, !{{[0-9]+}}, !"callee0: %a"}
-; CHECK-DAG: ![[CALLEE0_B:[0-9]+]] = distinct !{!{{[0-9]+}}, !{{[0-9]+}}, !"callee0: %b"}
-; CHECK-DAG: ![[SCOPE]] = !{![[CALLEE0_A]], ![[CALLEE0_B]]}
-
 declare void @llvm.lifetime.start.p0(i64, ptr nocapture)
 declare void @llvm.lifetime.end.p0(i64, ptr nocapture)
 declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
diff --git a/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll b/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
index efdbdce401b76..fb5d675402eba 100644
--- a/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
+++ b/llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll
@@ -1,9 +1,10 @@
-; RUN: opt < %s -S -passes=memcpyopt | FileCheck --match-full-lines %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -S -passes=memcpyopt | FileCheck %s
 
 ; Make sure callslot optimization merges alias.scope metadata correctly when it merges instructions.
 ; Merging here naively generates:
 ;  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope !3
-;  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !0
+;  call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %src), !noalias !0
 ;   ...
 ;  !0 = !{!1}
 ;  !1 = distinct !{!1, !2, !"callee1: %a"}
@@ -13,15 +14,20 @@
 ;  !5 = distinct !{!5, !"callee0"}
 ; Which is incorrect because the lifetime.end of %src will now "noalias" the above memcpy.
 define i8 @test(i8 %input) {
+; CHECK-LABEL: define i8 @test(
+; CHECK-SAME: i8 [[INPUT:%.*]]) {
+; CHECK-NEXT:    [[SRC:%.*]] = alloca i8, align 1
+; CHECK-NEXT:    store i8 [[INPUT]], ptr [[SRC]], align 1
+; CHECK-NEXT:    [[RET_VALUE:%.*]] = load i8, ptr [[SRC]], align 1
+; CHECK-NEXT:    ret i8 [[RET_VALUE]]
+;
   %tmp = alloca i8
   %dst = alloca i8
   %src = alloca i8
-; NOTE: we're matching the full line and looking for the lack of !alias.scope here
-; CHECK:   call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false)
-  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %src), !noalias !3
+  call void @llvm.lifetime.start.p0(i64 1, ptr nonnull %src), !noalias !3
   store i8 %input, ptr %src
   call void @llvm.memcpy.p0.p0.i64(ptr align 8 %tmp, ptr align 8 %src, i64 1, i1 false), !alias.scope !0
-  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !3
+  call void @llvm.lifetime.end.p0(i64 1, ptr nonnull %src), !noalias !3
   call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %tmp, i64 1, i1 false), !alias.scope !3
   %ret_value = load i8, ptr %dst
   ret i8 %ret_value
diff --git a/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll b/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll
index 2f1ce37ea2256..a463fed3ac58c 100644
--- a/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memcpy-undef.ll
@@ -96,7 +96,6 @@ define void @test_lifetime_partial_alias_3(ptr noalias %dst) {
 ; CHECK-NEXT:    [[A:%.*]] = alloca [16 x i8], align 1
 ; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 12, ptr [[A]])
 ; CHECK-NEXT:    [[GEP:%.*]] = getelementptr i8, ptr [[A]], i64 8
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST:%.*]], ptr [[GEP]], i64 4, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %a = alloca [16 x i8]
@@ -112,7 +111,6 @@ define void @test_lifetime_partial_alias_4(ptr noalias %dst) {
 ; CHECK-NEXT:    [[A:%.*]] = alloca [16 x i8], align 1
 ; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 12, ptr [[A]])
 ; CHECK-NEXT:    [[GEP:%.*]] = getelementptr i8, ptr [[A]], i64 8
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST:%.*]], ptr [[GEP]], i64 8, i1 false)
 ; CHECK-NEXT:    ret void
 ;
   %a = alloca [16 x i8]
diff --git a/llvm/test/Transforms/MemCpyOpt/stack-move.ll b/llvm/test/Transforms/MemCpyOpt/stack-move.ll
index 4a75c5eea2499..31e255b83eb9e 100644
--- a/llvm/test/Transforms/MemCpyOpt/stack-move.ll
+++ b/llvm/test/Transforms/MemCpyOpt/stack-move.ll
@@ -1649,20 +1649,13 @@ loop_exit:
   ret void
 }
 
-; Tests that failure because partial-sized lifetimes are counted as mod.
+; Tests that partial-sized lifetimes are not inhibiting the optimizer
 define void @partial_lifetime() {
 ; CHECK-LABEL: define void @partial_lifetime() {
-; CHECK-NEXT:    [[SRC:%.*]] = alloca [[STRUCT_FOO:%.*]], align 4
-; CHECK-NEXT:    [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
-; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 12, ptr captures(none) [[SRC]])
-; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 3, ptr captures(none) [[DEST]])
-; CHECK-NEXT:    store [[STRUCT_FOO]] { i32 10, i32 20, i32 30 }, ptr [[SRC]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @use_nocapture(ptr captures(none) [[SRC]])
-; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
-; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 3, ptr captures(none) [[SRC]])
+; CHECK-NEXT:    [[DEST:%.*]] = alloca [[STRUCT_FOO:%.*]], align 4
+; CHECK-NEXT:    store [[STRUCT_FOO]] { i32 10, i32 20, i32 30 }, ptr [[DEST]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @use_nocapture(ptr captures(none) [[DEST]])
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @use_nocapture(ptr captures(none) [[DEST]])
-; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 12, ptr captures(none) [[SRC]])
-; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 12, ptr captures(none) [[DEST]])
 ; CHECK-NEXT:    ret void
 ;
   %src = alloca %struct.Foo, align 4

github-actions · 2025-06-11T20:47:02Z

✅ With the latest revision this PR passed the C/C++ code formatter.

vtjnash · 2025-06-11T21:00:46Z

This is a dependent commit of vtjnash@645f247, which removes the function's requirement of having this constant size be computed and available.

As pointed out in #143782, these tests were specifying the size in bits instead of bytes. In order to preserve the intent of the tests, add a use of %src, which prevents stack-move optimization. These are supposed to test the handling of scoped alias metadata in call slot optimization.

As pointed out in llvm/llvm-project#143782, these tests were specifying the size in bits instead of bytes. In order to preserve the intent of the tests, add a use of %src, which prevents stack-move optimization. These are supposed to test the handling of scoped alias metadata in call slot optimization.

nikic

In practice, all lifetime markers should have an alloca argument and a size matching the alloca and be handled by the "whole alloca" code path that is independent of the passed size. In theory, lifetimes can be used in other ways, but this is the only practically relevant case. If I ever find the time, I'll drop the size argument from these intrinsics.

I do see some diffs from this change on dtcxzyw/llvm-opt-benchmark#2428, but I'm having a hard time understanding where they are actually coming from. None of the cases I looked at have mismatched lifetimes.

I think what actually happens is that your new implementation of hadUndefContentsBefore does not bail out if none of the cases match, which means that it will just continue the walk from the last found clobber -- which can only improve things if the clobber was the result of hitting the MSSA walk limit, rather than an actual clobber. So effectively, all this patch is really doing is add a bypass for the walk limit.

nikic · 2025-06-12T09:55:35Z

llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll

The changes here break the intent of the test. I've adjusted it in a8c6fb4 to preserve the existing behavior with the lifetimes fixed.

nikic · 2025-06-12T10:01:18Z

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

@@ -1820,7 +1835,8 @@ bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {
      }
    }

-    if (hasUndefContents(MSSA, BAA, M->getSource(), MD, M->getLength())) {
+    if (hadUndefContentsBefore(MSSA, BAA, M->getSource(), AnyClobber, SrcLoc,
+                               M->getLength())) {


Isn't this going to recompute the clobbering access now, even though we already know it?

Yes. I assumed it was inexpensive since it is a caching walker, but I could add a bool parameter if necessary to optimize that

nikic · 2025-06-12T10:17:18Z

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

+        continue;
+      } else if (II->getIntrinsicID() == Intrinsic::lifetime_end) {
+        Clobber = Def->getDefiningAccess();
+        continue;


Lack of break below here is bypassing the MSSA clobber walk limit.

There is a return false immediately below here

Oh yeah, there is. I was confused by the diff rendering. In that case I'm not sure what is going on.

I think your theory might be still correct about the proximal cause, it just only skips the limit specifically when the walk ended on a completely unrelated lifetime instruction. A local fix for that is:

diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp index 877600034f84..759b16206d5c 100644 --- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp +++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp @@ -1370,6 +1370,7 @@ bool MemCpyOptPass::processMemSetMemCpyDependence(MemCpyInst *MemCpy, static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA, Value *V, MemoryAccess *Clobber, MemoryLocation Loc, Value *Size) { + Value *VBase = getUnderlyingObject(V); while (1) { Clobber = MSSA->getWalker()->getClobberingMemoryAccess(Clobber, Loc, BAA); MemoryDef *Def = dyn_cast<MemoryDef>(Clobber); @@ -1377,12 +1378,18 @@ static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA, return false; if (MSSA->isLiveOnEntryDef(Def)) - return isa<AllocaInst>(getUnderlyingObject(V)); + return isa<AllocaInst>(VBase); if (auto *II = dyn_cast_or_null<IntrinsicInst>(Def->getMemoryInst())) { if (II->getIntrinsicID() == Intrinsic::lifetime_start) { auto *LTSize = cast<ConstantInt>(II->getArgOperand(0)); + // Check if the SSA Walk ended early due to heuristics or actually + // reached a lifetime instruction for this pointer. + Value *IIBase = getUnderlyingObject(II->getArgOperand(1)); + if (VBase != IIBase) + return false; + if (Size) if (auto CSize = dyn_cast<ConstantInt>(Size)) if (BAA.isMustAlias(V, II->getArgOperand(1)) && @@ -1393,8 +1400,8 @@ static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA, // does) and we're querying a pointer based on that alloca, then we know // the memory is definitely undef, regardless of how exactly we alias. // The size also doesn't matter, as an out-of-bounds access would be UB. - if (auto *Alloca = dyn_cast<AllocaInst>(getUnderlyingObject(V))) { - if (getUnderlyingObject(II->getArgOperand(1)) == Alloca) { + if (auto *Alloca = dyn_cast<AllocaInst>(VBase)) { + if (IIBase == Alloca) { const DataLayout &DL = Alloca->getDataLayout(); if (std::optional<TypeSize> AllocaSize = Alloca->getAllocationSize(DL)) @@ -1405,6 +1412,11 @@ static bool hadUndefContentsBefore(MemorySSA *MSSA, BatchAAResults &BAA, Clobber = Def->getDefiningAccess(); continue; } else if (II->getIntrinsicID() == Intrinsic::lifetime_end) { + // Check if the SSA Walk ended early due to heuristics or actually + // reached a lifetime instruction for this pointer. + Value *IIBase = getUnderlyingObject(II->getArgOperand(1)); + if (VBase != IIBase) + return false; Clobber = Def->getDefiningAccess(); continue; }

It looks like the getClobberingMemoryAccess code also could be easily adjusted to expose the walk limit (it already does, just not accessible from the header file)

But since it seems you're implying this function shouldn't gain any benefit from looking past the lifetime, I could also just drop this loop.

The tests however look better to me, so I also am wondering if there is some way to improve MemorySSA to handle simple cases like this, where there is a noalias base pointer (such as an alloca or malloc) with only a few uses in the dominator tree before the current MemoryDef location, none of which were capturing?

The tests however look better to me, so I also am wondering if there is some way to improve MemorySSA to handle simple cases like this, where there is a noalias base pointer (such as an alloca or malloc) with only a few uses in the dominator tree before the current MemoryDef location, none of which were capturing?

At least for the basic case (memcpy from fully undef alloca), the simplest and most robust way to handle it might be to do this in InstCombine. We already have this transform there:

llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Line 3218 in ce747a1

static bool isAllocSiteRemovable(Instruction *AI,

It handles the "only stores to alloca" case, but could be extended to also handle "only loads from alloca", we just can't have a mix of both.

I didn't read the tests closely, so I didn't realize those were often reads from undef it was removing. I thought that might have been GVN's job to detect a memcpy from undef contents and remove it?

I see at least one place there that mentions it shouldn't fail to optimize when encountering lifetime there:

llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Lines 4916 to 4925 in ce747a1

while (!AllocaUsers.empty()) {

auto *UserI = cast<Instruction>(AllocaUsers.pop_back_val());

if (isa<GetElementPtrInst>(UserI) || isa<AddrSpaceCastInst>(UserI)) {

pushUsers(*UserI);

continue;

}

if (UserI == CB)

continue;

// TODO: support lifetime.start/end here

return false;

I've trimmed this PR down to just the part I needed for vtjnash@645f247, pending separate discussion of whether it is even worthwhile optimizing those tests for which lifetime.start does not cover the whole alloca.

Additionally, I made a separate PR as you suggested (#143958) to handle those simple cases in InstCombine instead, and to see how that affects the benchmarks.

Having lifetime markers should only increase the information available to LLVM, but it would instead rely on the callback to entirely give up if it encountered a lifetime marker that wasn't full size, but sub-optimal lifetime markers are not supposed to be forbidding optimizations that would otherwise apply if they were either absent or optimal. This pass wasn't tracking GEP offsets either, so it wasn't quite correctly handled either, although earlier sub-optimal checks that this size is the same as the alloca test made this safe in the past, and unlikely to have encountered anything else in the past.

As pointed out in llvm#143782, these tests were specifying the size in bits instead of bytes. In order to preserve the intent of the tests, add a use of %src, which prevents stack-move optimization. These are supposed to test the handling of scoped alias metadata in call slot optimization.

vtjnash requested a review from nikic as a code owner June 11, 2025 20:44

llvmbot added llvm:analysis llvm:transforms labels Jun 11, 2025

vtjnash force-pushed the jn/memcpyopt-lifetimes branch from 16189f1 to a2f0414 Compare June 11, 2025 20:54

vtjnash changed the title ~~[memcpyopt] fix incorrect handling of lifetime markers~~ [MemCpyOpt] fix incorrect handling of lifetime markers Jun 12, 2025

nikic mentioned this pull request Jun 12, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

dtcxzyw mentioned this pull request Jun 12, 2025

pre-commit: PR143782 dtcxzyw/llvm-opt-benchmark#2428

Open

nikic requested changes Jun 12, 2025

View reviewed changes

vtjnash mentioned this pull request Jun 12, 2025

[InstCombine] remove dead loads, such as memcpy from undef #143958

Open

vtjnash added 3 commits June 12, 2025 20:16

stop walk

feaa5fa

remove walk

015e077

vtjnash force-pushed the jn/memcpyopt-lifetimes branch from a2f0414 to 015e077 Compare June 12, 2025 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MemCpyOpt] fix incorrect handling of lifetime markers #143782

[MemCpyOpt] fix incorrect handling of lifetime markers #143782

Uh oh!

vtjnash commented Jun 11, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 11, 2025 •

edited

Loading

Uh oh!

vtjnash commented Jun 11, 2025

Uh oh!

nikic left a comment

Uh oh!

nikic Jun 12, 2025

Uh oh!

vtjnash Jun 12, 2025

Uh oh!

nikic Jun 12, 2025

Uh oh!

vtjnash Jun 12, 2025

Uh oh!

nikic Jun 12, 2025

Uh oh!

vtjnash Jun 12, 2025

Uh oh!

nikic Jun 12, 2025

Uh oh!

vtjnash Jun 12, 2025

Uh oh!

nikic Jun 12, 2025

Uh oh!

vtjnash Jun 12, 2025

Uh oh!

vtjnash Jun 12, 2025

Uh oh!

Uh oh!

	while (!AllocaUsers.empty()) {
	auto *UserI = cast<Instruction>(AllocaUsers.pop_back_val());
	if (isa<GetElementPtrInst>(UserI) \|\| isa<AddrSpaceCastInst>(UserI)) {
	pushUsers(*UserI);
	continue;
	}
	if (UserI == CB)
	continue;
	// TODO: support lifetime.start/end here
	return false;

[MemCpyOpt] fix incorrect handling of lifetime markers #143782

Are you sure you want to change the base?

[MemCpyOpt] fix incorrect handling of lifetime markers #143782

Uh oh!

Conversation

vtjnash commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vtjnash commented Jun 11, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vtjnash commented Jun 11, 2025 •

edited

Loading

llvmbot commented Jun 11, 2025 •

edited

Loading

github-actions bot commented Jun 11, 2025 •

edited

Loading