[RFC] Proper lowering of constant alloca operations

This is related to #866 .

Since #892 , ClangIR lowers constant local variables in C/C++ to `cir.alloca` operations with a `const` flag. The presence of the `const` flag implies:

- The variable must be initialized by a following `cir.store` operation, and
- All `cir.load` operation that loads the `cir.alloca` result must produce the value stored by the `cir.store` operation.

An obvious optimization here is that we could eliminate all the loads and replace the loaded values with the stored initial value. LLVM already implements similar optimizations, but we need to tweak the generated LLVM IR to teach LLVM to apply those optimizations. I'm proposing several approaches here that could lead to such optimizations in LLVM, and hope we could choose one that best fits our needs.

# Approach 1: use the `llvm.invariant.start` intrinsic

The first approach would be using the `llvm.invariant.start` and the `llvm.invariant.end` intrinsic. This pair of intrinsics tell the optimizer that a specified memory location will never change within the region bounded by the intrinsics. With this approach, the following CIR:

```mlir
cir.func @test(@init: !s32i) {
  %0 = cir.alloca !s32i, !cir.ptr<!s32i>, ["var", init, const]
  cir.store @init, %0 : !s32i, !cir.ptr<!s32i>

  // example uses of %0
  %1 = cir.load %0 : !cir.ptr<!s32i>, !s32i
  cir.call @clobber(%1) : (!cir.ptr<!s32i>) -> ()
  %2 = cir.load %0 : !cir.ptr<!s32i>, !s32i

  cir.return
}
```

would generate the following LLVM IR:

```llvm
define dso_local void @test(i32 %init) {
  %1 = alloca i32, align 4
  store i32 %init, ptr %1, align 4
  %inv = call ptr @llvm.invariant.start(ptr %1, i64 4)

  // example uses of %0
  %2 = load i32, ptr %1, align 4
  call void @clobber(ptr %1)
  %3 = load i32, ptr %1, align 4

  call void @llvm.invariant.end(ptr %inv, i64 4, ptr %1)
  ret void
}
```

Theoretically, the optimizer would be able to at least fold `%3` into `%2`. Ironically, it seems that the optimizer refuses to optimize if the `llvm.invariant.end` intrinsic call is present, see https://godbolt.org/z/5dMv7T77e. To bypass this limitation, simply remove the call to the `llvm.invariant.end` intrinsic, and the optimizer works as expected.

# Approach 2: use the `!invariant.load` metadata

A load instruction could have an `!invariant.load` metadata attached. The LLVM language reference says:

>  If a load instruction tagged with the `!invariant.load` metadata is executed, the memory location referenced by the load has to contain the same value at all points in the program where the memory location is dereferenceable; otherwise, the behavior is undefined.

With this approach, the CIR snippet listed earlier would emit the following LLVM IR: 

```llvm
define dso_local void @test(i32 %init) {
  %1 = alloca i32, align 4
  store i32 %init, ptr %1, align 4

  ; example uses of %1
  %2 = load i32, ptr %1, align 4, !invariant.load !0
  call void @clobber(ptr %1)
  %3 = load i32, ptr %1, align 4, !invariant.load !0

  ret void
}

!0 = !{}
```

The optimizer could then fold both load instructions to just `%init`, see https://godbolt.org/z/Exnh85zhx.

It's worth mentioning here that the `!invariant.load` metadata is already supported by the MLIR LLVMIR dialect.

# Approach 3: use the `!invariant.group` metadata

A load instruction or a store instruction could have an `!invariant.group` metadata attached. Unlike `!invariant.load`, the `!invariant.group` only requires that every value loaded or stored by such instructions must be the same if those instructions load or store to the same pointer. With this approach, the CIR snippet listed earlier would emit the following LLVM IR:

```llvm
define dso_local void @test(i32 %init) {
  %1 = alloca i32, align 4
  store i32 %init, ptr %1, align 4, !invariant.group !0

  ; example uses of %1
  %2 = load i32, ptr %1, align 4, !invariant.group !0
  call void @clobber(ptr %1)
  %3 = load i32, ptr %1, align 4, !invariant.group !0

  ret void
}

!0 = !{}
```

The optimizer could then fold both load instructions to just `%init`, see https://godbolt.org/z/8MsxcoqTY.

# Constant local variables in inner scopes

Let's consider a slightly more complex example:

```c++
void test(std::vector<int> vec) {
  for (const int item : vec)
    do_something(item);
}
```

Upon each iteration, the local variable `item` would reuse the same memory location.  But ideally we would like to still teach LLVM that `item` is constant during a single iteration. The second approach is infeasible since the value in the memory location changes between iterations. Thus only the first approach and the third approach is suitable for such a case.

The first approach would emit code like this:

```llvm
define dso_local void @test() {
  %item = alloca i32, align 4
  ; ...
loop.body:
  store i32 %loop.ind, ptr %item, align 4
  %inv = call ptr @llvm.invariant.start(ptr %item, i64 4)
  
  ; loop body goes here, an example load instruction below
  %1 = load i32, ptr %item, align 4
  
  call void @llvm.invariant.end(ptr %inv, i64 4, ptr %1)
  br label %loop.header
}
```

The third approach would emit code like this:

```llvm
define dso_local void @test() {
  %item = alloca i32, align 4
  ; ...
loop.body:
  %item.0 = phi ptr [ %item, %0 ], [ %item.launder, %loop.body ]
  store i32 %loop.ind, ptr %item.0, align 4, !invariant.group !0
  
  ; loop body goes here, an example load instruction below
  %1 = load i32, ptr %item.0, align 4, !invariant.group !0
  
  %item.launder = call @llvm.launder.invariant.group(ptr %item.0)
  br label %loop.body
}

!0 = !{}
```

The call to the `llvm.launder.invariant.group` intrinsic makes sure that each iteration creates a "distinct invariant group". Without this intrinsic call, the optimizer could assume that the load and store instructions would load and store the same value across all iterations of the loop.

So what do you think about these 3 lowering approaches? Or do you know any other approaches that this proposal does not mention?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Proper lowering of constant alloca operations #1060

Approach 1: use the `llvm.invariant.start` intrinsic

Approach 2: use the `!invariant.load` metadata

Approach 3: use the `!invariant.group` metadata

Constant local variables in inner scopes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Proper lowering of constant alloca operations #1060

Description

Approach 1: use the llvm.invariant.start intrinsic

Approach 2: use the !invariant.load metadata

Approach 3: use the !invariant.group metadata

Constant local variables in inner scopes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Approach 1: use the `llvm.invariant.start` intrinsic

Approach 2: use the `!invariant.load` metadata

Approach 3: use the `!invariant.group` metadata