MLIR code generation from Julia SSA IR. #77

jumerckx · 2024-06-09T22:09:57Z

This PR contains an experimental way to generate MLIR code from Julia code. Similar in spirit to MLIR-python-extras, and initially based off of Pangoraw's Brutus example.
Currently it probably only works for Julia 1.11 as it uses some Core.Compiler functions that are in flux.

Example

# Regular Julia code:
struct Point{T}
    x::T
    y::T
end

struct Line{T}
    p1::Point{T}
    p2::Point{T}
end

sq_distance(l::Line) = (l.p1.x - l.p2.x)^2 + (l.p1.y - l.p2.y)^2

# Convert this to MLIR:
op2 = cg(Tuple{Point{i64},Point{i64}}) do a, b
    l = Line(a, b)
    d_2 = sq_distance(l)

    Point(((a.x, a.y) .- d_2)...)
end

generates:

  func.func @f(%arg0: i64, %arg1: i64, %arg2: i64, %arg3: i64) -> (i64, i64) {
    %0 = arith.subi %arg0, %arg2 : i64
    %1 = arith.muli %0, %0 : i64
    %2 = arith.subi %arg1, %arg3 : i64
    %3 = arith.muli %2, %2 : i64
    %4 = arith.addi %1, %3 : i64
    %5 = arith.subi %arg0, %4 : i64
    %6 = arith.subi %arg1, %4 : i64
    return %5, %6 : i64, i64
  }

I'm leaving out some details but the full code is included in examples/main.jl.

It is possible to customise Julia function <--> MLIR operation mappings by defining "intrinsic" functions (this is an awfully vague name, any alternatives?).
For example, an intrinsic function that maps addition onto arith.addi:

@intrinsic Base.:+(a::T, b::T) where {T<:MLIRInteger} = T(Dialects.arith.addi(a, b)|>result)

Again, more details can be found in examples/definitions.jl.

Internal Details

Julia code is first lowered to SSA IR using a custom AbstractInterpreter (Generate.MLIRInterpreter) that allows types other than Bool to be used for conditional gotos. This AbstractInterpreter also overwrites the inlining policy to inline everything except calls to intrinsics. (see src/Generate/absint.jl)

With this SSA IR in hand, a source2source transformation is applied that replaces all control flow statements to builder functions that build corresponding MLIR unstructured control flow operations. (see src/Generate/transform.jl)

Finally, the transformed IR is executed and produces an MLIR region.

The code generation can be nested to support generating MLIR operations containing regions, the last example in examples/main.jl shows how an scf.for operation can be generated.

Some Notes

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.
TTFG (time-to-first-generate MLIR code) is long. I haven't worked on optimising this but I believe the biggest slowdowns have to do with the AbstractInterpreter caching. CI is way slower because of this...
This PR is an updated version of what I did for my Master's thesis. The internal details compared to my thesis have changed a bit (for the better), but for my thesis, I wrote a lot more examples including generating linalg operations from einsum expressions, or generating GPU kernels. The code can be found here

commit 0a3c2b7 Merge: 1638254 5c35af3 Author: jumerckx <[email protected]> Date: Sat Jun 1 23:10:34 2024 +0200 Merge remote-tracking branch 'upstream/main' into jm/MLIRValueTrait commit 1638254 Author: jumerckx <[email protected]> Date: Wed Mar 27 20:08:28 2024 +0100 rename `value(::AffineExpr)`, use IR.Value type instead of API.MlirValue in jl-generator commit 912d8f5 Author: jumerckx <[email protected]> Date: Wed Mar 20 21:12:01 2024 +0100 Apply suggestions from code review Co-authored-by: Sergio Sánchez Ramírez <[email protected]> commit c57464a Author: jumerckx <[email protected]> Date: Wed Mar 20 21:10:51 2024 +0100 Update deps/tblgen/jl-generators.cc Co-authored-by: Sergio Sánchez Ramírez <[email protected]> commit 078f15e Author: jumerckx <[email protected]> Date: Wed Mar 20 21:10:08 2024 +0100 Update src/IR/Value.jl Co-authored-by: Sergio Sánchez Ramírez <[email protected]> commit 0b5b443 Author: jumerckx <[email protected]> Date: Sun Mar 17 22:18:30 2024 +0100 MLIRValueTrait changes

vchuravy · 2024-06-10T00:42:24Z

Really nice! I like the intrinisc approach a lot!

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

In the original Brutus we had a simple worklist to allow for function calls.

https://github.com/JuliaLabs/brutus/blob/c45ec5e465c0de01dc771e3facee7479fd2ac8ef/Brutus/src/codegen.jl#L70-L86

mofeing · 2024-06-10T09:49:10Z

This looks awesome!!

Is the full set of Julia SSA IR mappable to MLIR?

It is possible to customise Julia function <--> MLIR operation mappings by defining "intrinsic" functions (this is an awfully vague name, any alternatives?).

intrinsic looks nice to me. If you need other names, maybe primitive?

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

But it would be okay if we added an @intrinsic for these cases right?

TTFG (time-to-first-generate MLIR code) is long. I haven't worked on optimising this but I believe the biggest slowdowns have to do with the AbstractInterpreter caching. CI is way slower because of this...

Would it be posible to run the AbstractInterpreter on some warmup codes during precompilation and save the cache for later?

jumerckx · 2024-06-10T17:17:36Z

In the original Brutus we had a simple worklist to allow for function calls.

Aha, thanks for the link, this should be doable. How does this handle function names? e.g. when calling +(::Int, ::Int) and +(::Float64, Float64) in the same function?

Is the full set of Julia SSA IR mappable to MLIR?

Not completely yet. I've ignored PhiC and Upsilon nodes, and PiNodes don't generate any MLIR.

For full Julia compilation like Brutus, the current intrinsic system also doesn't suffice because Julia builtins cannot be specialized.

julia> Base.abs_float(x) = x
ERROR: cannot add methods to a builtin function
Stacktrace:
 [1] top-level scope
   @ REPL[6]:1

Tapir.jl handles this by replacing any call to a Core.IntrinsicFunction with a call to a function that calls the builtin. (code).

The current system also doesn't allow to map existing Julia types onto MLIR operations. For example, defining an intrinsic for Base.:+(::Int, ::Int) won't work because it literally redefines this method so you can't add integers together anymore.

To fix this, intrinsic definitions should exist in a separate context. CassetteOverlay.jl could be used to achieve something like this, but during my thesis I depended on it and ran into strange errors and found the package to be quite fiddly.

Alternatively, we could again take inspiration from Tapir.jl and use something similar to their rrule (relevant code).
I.e., use a function like intrinsic(<:Tuple{Base.:+, ::Int, ::Int}) = ... instead of redefining the method. The source transformation should then replace each call to a function with a call to intrinsic with the correct signature.

Would it be posible to run the AbstractInterpreter on some warmup codes during precompilation and save the cache for later?

Yes! One annoyance is that redefining intrinsics can spoil the cache because there's no backedge from e.g. Generate.is_intrinsic(::Type{<:Tuple{typeof(+), T, T}}) to Base.:+(::T, ::T). Tapir.jl faces the same limitation.
I spent some time trying to add these backedges manually but quickly found myself out of my depth.
Still, this should speed up things a lot for the initial experience. I haven't done this before so would have to find how exactly to do this properly.

But it would be okay if we added an @intrinsic for these cases right?

Not sure I'm following here. Currently, the ability to define intrinsics can't cope with recursion and dynamic dispatch. Full inlining is required to expose all control flow that's otherwise hidden in function calls. All control flow needs to be known upfront because all basic blocks are created at the start of MLIR generation.
With a worklist algorithm that generates calls instead of inlining everything, this should be a non-issue, though.

vchuravy · 2024-06-10T17:49:41Z

We used https://github.com/JuliaLabs/brutus/blob/c45ec5e465c0de01dc771e3facee7479fd2ac8ef/lib/Codegen/Codegen.cpp#L42

So the name included the signature.
https://github.com/JuliaLabs/brutus/blob/c45ec5e465c0de01dc771e3facee7479fd2ac8ef/test/Codegen/translate/caller.jl#L11

mofeing · 2024-06-29T20:20:19Z

examples/generate/main.jl

+# isbitstype structs can be used as arguments or return types.
+# They will be unpacked into their constituent fields that are convertible to MLIR types.
+# e.g. a function `Point{i64}` taking and returning a point, will be converted to a MLIR
+# function that takes two `i64` arguments and returns two `i64` values.


Maybe we can do something like toy.struct? https://mlir.llvm.org/docs/Tutorials/Toy/Ch-7/

Yes, but then we're going down the path of having a custom Julia dialect that should probably be included in MLIR_jll?
(not opposed to that though, I think it's a good next step once this becomes more useable)

mmm not necessarily. I think it was agreed that the Julia dialect should be implemented with IRDL and loaded dynamically.

I'm ok with the current solution while Julia dialect lands. It would be nice to use tuple types instead, but seem like there are no ops for their manipulation https://mlir.llvm.org/docs/Rationale/Rationale/#tuple-types

A custom dialect is likely needed for GC allocations

mofeing · 2024-06-29T20:28:55Z

src/Generate/intrinsic.jl

+function intrinsic_(expr)
+    dict = splitdef(expr)
+    # TODO: this can probably be fixed:
+    length(dict[:kwargs]) == 0 || error("Intrinsic functions can't have keyword arguments\nDefine a regular function with kwargs that calls the intrinsic instead.")


Maybe we can turn the Core.kwcall method into a intrinsic instead.

jumerckx · 2024-06-30T08:21:41Z

Really nice! I like the intrinisc approach a lot!

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

In the original Brutus we had a simple worklist to allow for function calls.

JuliaLabs/brutus@c45ec5e/Brutus/src/codegen.jl#L70-L86

I'm implementing function calls but I'm having trouble understanding the difference between invokes and calls.
In the Brutus code, it seems like only invokes are being added to the worklist. But when I look at IRCode for some functions, I also see calls that would need to be added to the worklist.
Based on what I found online, "calls" are dynamic function calls while "invokes" are static. But why is sitofp a call instruction instead of an invoke, AFAICT, the method can be statically decided...

julia> only(Base.code_ircode(sin, Tuple{Int}))
1627 1 ─ %1 = Base.sitofp(Float64, _2)::Float64                                                                                                               │╻╷╷ float
1629 2 ─ %2 = invoke Base.Math.sin(%1::Float64)::Float64                                                                                                      │   
     └──      return %2                                                                                                                                       │   
      => Float64

vchuravy · 2024-06-30T11:14:28Z

Based on what I found online, "calls" are dynamic function calls while "invokes" are static.

This is generally correct, but intrinsics and builtins are calls and not invokes. But they don't need to be Bart of the worklist since we can lower them directly to an instruction.

jumerckx · 2024-06-30T16:28:38Z

I implemented the worklist + function invocations:

fibonacci(n) = n < 2 ? n : fibonacci(n-1) + fibonacci(n-2)
cg(fibonacci, Tuple{i64})

module {
  func.func private @"Tuple{typeof(fibonacci), i64}"(%arg0: i64) -> i64 {
    %c2_i64 = arith.constant 2 : i64
    %0 = arith.cmpi slt, %arg0, %c2_i64 : i64
    cf.cond_br %0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    return %arg0 : i64
  ^bb2:  // pred: ^bb0
    %c1_i64 = arith.constant 1 : i64
    %1 = arith.subi %arg0, %c1_i64 : i64
    %2 = call @"Tuple{typeof(fibonacci), i64}"(%1) : (i64) -> i64
    %c2_i64_0 = arith.constant 2 : i64
    %3 = arith.subi %arg0, %c2_i64_0 : i64
    %4 = call @"Tuple{typeof(fibonacci), i64}"(%3) : (i64) -> i64
    %5 = arith.addi %2, %4 : i64
    return %5 : i64
  }
}

vchuravy · 2024-06-30T17:34:19Z

One thing you might want to guard against is redefinition. And other names pacing issues. Right now your function name has the potential for collision since there may be multiple function f either across modules or across world (nee redefinition)

jumerckx · 2024-07-01T12:34:03Z

examples/generate/main.jl

+    p2::Point{T}
+end
+
+@inline Base.literal_pow(::typeof(^), x::MLIRInteger, ::Val{2}) = x*x


This definition didn't use to be necessary when everything was forced to be fully inlined.
I assume constant propagation doesn't travel accross function boundaries or something?

Enabling aggressive constant propagation in the inference parameters fixes this (9fdffa7)

…finitions

jumerckx · 2024-07-16T15:15:43Z

With the worklist for function calls, I'm running into problems with captured variables in functions.
For example:

function h(x)
    f(x) + f(x+1)
end

function f(x)
    @noinline g() = x+1
    g()
end

2 1 ─ %1 = %new(var"#g#50"{Int64}, _2)::var"#g#50"{Int64}                │╻ f
  │   %2 = invoke %1()::Int64                                            ││
  │   %3 = (Core.Intrinsics.add_int)(_2, 1)::Int64                       │╻ +
  │   %4 = %new(var"#g#50"{Int64}, %3)::var"#g#50"{Int64}                │╻ f
  │   %5 = invoke %4()::Int64                                            ││
  │   %6 = (Core.Intrinsics.add_int)(%2, %5)::Int64                      │╻ +
  └──      return %6                                                     │ 
   => Int64

When generating MLIR code for h, the two invocations to #g#50 have the same Methodinstance. The two func.call operations that are generated will call the same MLIR function, leading to incorrect code.
This can be solved by generating a different MLIR func.func for each function instance, but this introduces quite some complexity.

Another approach could be to insert an extra argument x in the generated MLIR func for #g#50. At function call sites, all captured variables need to be passed explicitly.

Any more ideas on how to tackle this? I think it's important that these kind of higher-order functions work because they are useful to model more complex MLIR operations with nested regions.

vchuravy · 2024-07-16T16:16:00Z

Any more ideas on how to tackle this?

This is why the Julia calling convention is that there is a hidden #self# argument as the first argument that is the called function.
If that argument is a "ghost type" it disappears, but it contains the closure.

src/Generate/transform.jl

vchuravy · 2024-07-17T12:28:44Z

I would recommend passing the first argument through for now and optimizing this later.

Julia does use two calling covnentions the first one is boxed values and the second one is unboxed.
The first calls the latter, and only the latter removes ghost types.

* don't generate argument values for captured values in top-level functions, these are directly passed to the code generation

jumerckx added 8 commits June 9, 2024 22:09

LLVM 14 dialect bindings using ValueTrait

d666503

LLVM 15/16/17 dialect bindings using ValueTrait

d26ef01

abstractinterpreter for inlining everything but "intrinsics"

41f971f

WIP: commit before rebasing on jm/MLIRValueTrait

b9e1bf2

unpack: the reason for ValueTrait

6d80a1a

code generation works!

36ee97e

remove verbose logging and add tests

4cdc557

mofeing reviewed Jun 29, 2024

View reviewed changes

worklist algorithm to support function invocations

0bdfbc5

fix tests: nested code generation was broken

d579914

jumerckx commented Jul 1, 2024

View reviewed changes

make code generation test invariant to different order of function de…

96bd9ee

…finitions

mofeing mentioned this pull request Jul 7, 2024

Remove Cassette EnzymeAD/Reactant.jl#37

Closed

jumerckx added 2 commits July 13, 2024 13:14

enable aggressive constant propagation

9fdffa7

include worldage and module in generated function name

8bd3ddb

handle captured variables in generated functions

86e7905

vchuravy reviewed Jul 17, 2024

View reviewed changes

src/Generate/transform.jl Outdated Show resolved Hide resolved

jumerckx added 2 commits July 17, 2024 20:53

* handle const function objects.

203d0c2

* don't generate argument values for captured values in top-level functions, these are directly passed to the code generation

small fix

7cac97f

jumerckx force-pushed the jm/generate branch from a8fb5b1 to 7cac97f Compare July 18, 2024 10:08

execute erroneous code to get the actual errors

8d6f355

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLIR code generation from Julia SSA IR. #77

MLIR code generation from Julia SSA IR. #77

jumerckx commented Jun 9, 2024

vchuravy commented Jun 10, 2024

mofeing commented Jun 10, 2024 •

edited

Loading

jumerckx commented Jun 10, 2024

vchuravy commented Jun 10, 2024

mofeing Jun 29, 2024

jumerckx Jun 29, 2024 •

edited

Loading

mofeing Jun 29, 2024 •

edited

Loading

vchuravy Jun 30, 2024

mofeing Jun 29, 2024

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx Jul 1, 2024

jumerckx Jul 13, 2024

jumerckx commented Jul 16, 2024 •

edited

Loading

vchuravy commented Jul 16, 2024

vchuravy commented Jul 17, 2024

MLIR code generation from Julia SSA IR. #77

Are you sure you want to change the base?

MLIR code generation from Julia SSA IR. #77

Conversation

jumerckx commented Jun 9, 2024

Example

Internal Details

Some Notes

vchuravy commented Jun 10, 2024

mofeing commented Jun 10, 2024 • edited Loading

jumerckx commented Jun 10, 2024

vchuravy commented Jun 10, 2024

mofeing Jun 29, 2024

Choose a reason for hiding this comment

jumerckx Jun 29, 2024 • edited Loading

Choose a reason for hiding this comment

mofeing Jun 29, 2024 • edited Loading

Choose a reason for hiding this comment

vchuravy Jun 30, 2024

Choose a reason for hiding this comment

mofeing Jun 29, 2024

Choose a reason for hiding this comment

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx commented Jun 30, 2024

vchuravy commented Jun 30, 2024

jumerckx Jul 1, 2024

Choose a reason for hiding this comment

jumerckx Jul 13, 2024

Choose a reason for hiding this comment

jumerckx commented Jul 16, 2024 • edited Loading

vchuravy commented Jul 16, 2024

vchuravy commented Jul 17, 2024

mofeing commented Jun 10, 2024 •

edited

Loading

jumerckx Jun 29, 2024 •

edited

Loading

mofeing Jun 29, 2024 •

edited

Loading

jumerckx commented Jul 16, 2024 •

edited

Loading