Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLIR code generation from Julia SSA IR. #77

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

MLIR code generation from Julia SSA IR. #77

wants to merge 17 commits into from

Conversation

jumerckx
Copy link
Collaborator

@jumerckx jumerckx commented Jun 9, 2024

This PR contains an experimental way to generate MLIR code from Julia code. Similar in spirit to MLIR-python-extras, and initially based off of Pangoraw's Brutus example.
Currently it probably only works for Julia 1.11 as it uses some Core.Compiler functions that are in flux.

Example

# Regular Julia code:
struct Point{T}
    x::T
    y::T
end

struct Line{T}
    p1::Point{T}
    p2::Point{T}
end

sq_distance(l::Line) = (l.p1.x - l.p2.x)^2 + (l.p1.y - l.p2.y)^2

# Convert this to MLIR:
op2 = cg(Tuple{Point{i64},Point{i64}}) do a, b
    l = Line(a, b)
    d_2 = sq_distance(l)

    Point(((a.x, a.y) .- d_2)...)
end

generates:

  func.func @f(%arg0: i64, %arg1: i64, %arg2: i64, %arg3: i64) -> (i64, i64) {
    %0 = arith.subi %arg0, %arg2 : i64
    %1 = arith.muli %0, %0 : i64
    %2 = arith.subi %arg1, %arg3 : i64
    %3 = arith.muli %2, %2 : i64
    %4 = arith.addi %1, %3 : i64
    %5 = arith.subi %arg0, %4 : i64
    %6 = arith.subi %arg1, %4 : i64
    return %5, %6 : i64, i64
  }

I'm leaving out some details but the full code is included in examples/main.jl.

It is possible to customise Julia function <--> MLIR operation mappings by defining "intrinsic" functions (this is an awfully vague name, any alternatives?).
For example, an intrinsic function that maps addition onto arith.addi:

@intrinsic Base.:+(a::T, b::T) where {T<:MLIRInteger} = T(Dialects.arith.addi(a, b)|>result)

Again, more details can be found in examples/definitions.jl.

Internal Details

Julia code is first lowered to SSA IR using a custom AbstractInterpreter (Generate.MLIRInterpreter) that allows types other than Bool to be used for conditional gotos. This AbstractInterpreter also overwrites the inlining policy to inline everything except calls to intrinsics. (see src/Generate/absint.jl)

With this SSA IR in hand, a source2source transformation is applied that replaces all control flow statements to builder functions that build corresponding MLIR unstructured control flow operations. (see src/Generate/transform.jl)

Finally, the transformed IR is executed and produces an MLIR region.

The code generation can be nested to support generating MLIR operations containing regions, the last example in examples/main.jl shows how an scf.for operation can be generated.

Some Notes

  • Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.
  • TTFG (time-to-first-generate MLIR code) is long. I haven't worked on optimising this but I believe the biggest slowdowns have to do with the AbstractInterpreter caching. CI is way slower because of this...
  • This PR is an updated version of what I did for my Master's thesis. The internal details compared to my thesis have changed a bit (for the better), but for my thesis, I wrote a lot more examples including generating linalg operations from einsum expressions, or generating GPU kernels. The code can be found here

commit 0a3c2b7
Merge: 1638254 5c35af3
Author: jumerckx <[email protected]>
Date:   Sat Jun 1 23:10:34 2024 +0200

    Merge remote-tracking branch 'upstream/main' into jm/MLIRValueTrait

commit 1638254
Author: jumerckx <[email protected]>
Date:   Wed Mar 27 20:08:28 2024 +0100

    rename `value(::AffineExpr)`, use IR.Value type instead of API.MlirValue in jl-generator

commit 912d8f5
Author: jumerckx <[email protected]>
Date:   Wed Mar 20 21:12:01 2024 +0100

    Apply suggestions from code review

    Co-authored-by: Sergio Sánchez Ramírez <[email protected]>

commit c57464a
Author: jumerckx <[email protected]>
Date:   Wed Mar 20 21:10:51 2024 +0100

    Update deps/tblgen/jl-generators.cc

    Co-authored-by: Sergio Sánchez Ramírez <[email protected]>

commit 078f15e
Author: jumerckx <[email protected]>
Date:   Wed Mar 20 21:10:08 2024 +0100

    Update src/IR/Value.jl

    Co-authored-by: Sergio Sánchez Ramírez <[email protected]>

commit 0b5b443
Author: jumerckx <[email protected]>
Date:   Sun Mar 17 22:18:30 2024 +0100

    MLIRValueTrait changes
@vchuravy
Copy link
Member

Really nice! I like the intrinisc approach a lot!

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

In the original Brutus we had a simple worklist to allow for function calls.

https://github.com/JuliaLabs/brutus/blob/c45ec5e465c0de01dc771e3facee7479fd2ac8ef/Brutus/src/codegen.jl#L70-L86

@mofeing
Copy link
Collaborator

mofeing commented Jun 10, 2024

This looks awesome!!

Is the full set of Julia SSA IR mappable to MLIR?

It is possible to customise Julia function <--> MLIR operation mappings by defining "intrinsic" functions (this is an awfully vague name, any alternatives?).

intrinsic looks nice to me. If you need other names, maybe primitive?

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

But it would be okay if we added an @intrinsic for these cases right?

TTFG (time-to-first-generate MLIR code) is long. I haven't worked on optimising this but I believe the biggest slowdowns have to do with the AbstractInterpreter caching. CI is way slower because of this...

Would it be posible to run the AbstractInterpreter on some warmup codes during precompilation and save the cache for later?

@jumerckx
Copy link
Collaborator Author

In the original Brutus we had a simple worklist to allow for function calls.

Aha, thanks for the link, this should be doable. How does this handle function names? e.g. when calling +(::Int, ::Int) and +(::Float64, Float64) in the same function?

Is the full set of Julia SSA IR mappable to MLIR?

Not completely yet. I've ignored PhiC and Upsilon nodes, and PiNodes don't generate any MLIR.

For full Julia compilation like Brutus, the current intrinsic system also doesn't suffice because Julia builtins cannot be specialized.

julia> Base.abs_float(x) = x
ERROR: cannot add methods to a builtin function
Stacktrace:
 [1] top-level scope
   @ REPL[6]:1

Tapir.jl handles this by replacing any call to a Core.IntrinsicFunction with a call to a function that calls the builtin. (code).

The current system also doesn't allow to map existing Julia types onto MLIR operations. For example, defining an intrinsic for Base.:+(::Int, ::Int) won't work because it literally redefines this method so you can't add integers together anymore.

To fix this, intrinsic definitions should exist in a separate context. CassetteOverlay.jl could be used to achieve something like this, but during my thesis I depended on it and ran into strange errors and found the package to be quite fiddly.

Alternatively, we could again take inspiration from Tapir.jl and use something similar to their rrule (relevant code).
I.e., use a function like intrinsic(<:Tuple{Base.:+, ::Int, ::Int}) = ... instead of redefining the method. The source transformation should then replace each call to a function with a call to intrinsic with the correct signature.

Would it be posible to run the AbstractInterpreter on some warmup codes during precompilation and save the cache for later?

Yes! One annoyance is that redefining intrinsics can spoil the cache because there's no backedge from e.g. Generate.is_intrinsic(::Type{<:Tuple{typeof(+), T, T}}) to Base.:+(::T, ::T). Tapir.jl faces the same limitation.
I spent some time trying to add these backedges manually but quickly found myself out of my depth.
Still, this should speed up things a lot for the initial experience. I haven't done this before so would have to find how exactly to do this properly.

But it would be okay if we added an @intrinsic for these cases right?

Not sure I'm following here. Currently, the ability to define intrinsics can't cope with recursion and dynamic dispatch. Full inlining is required to expose all control flow that's otherwise hidden in function calls. All control flow needs to be known upfront because all basic blocks are created at the start of MLIR generation.
With a worklist algorithm that generates calls instead of inlining everything, this should be a non-issue, though.

@vchuravy
Copy link
Member

Comment on lines +52 to +55
# isbitstype structs can be used as arguments or return types.
# They will be unpacked into their constituent fields that are convertible to MLIR types.
# e.g. a function `Point{i64}` taking and returning a point, will be converted to a MLIR
# function that takes two `i64` arguments and returns two `i64` values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do something like toy.struct? https://mlir.llvm.org/docs/Tutorials/Toy/Ch-7/

Copy link
Collaborator Author

@jumerckx jumerckx Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then we're going down the path of having a custom Julia dialect that should probably be included in MLIR_jll?
(not opposed to that though, I think it's a good next step once this becomes more useable)

Copy link
Collaborator

@mofeing mofeing Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm not necessarily. I think it was agreed that the Julia dialect should be implemented with IRDL and loaded dynamically.

I'm ok with the current solution while Julia dialect lands. It would be nice to use tuple types instead, but seem like there are no ops for their manipulation https://mlir.llvm.org/docs/Rationale/Rationale/#tuple-types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A custom dialect is likely needed for GC allocations

function intrinsic_(expr)
dict = splitdef(expr)
# TODO: this can probably be fixed:
length(dict[:kwargs]) == 0 || error("Intrinsic functions can't have keyword arguments\nDefine a regular function with kwargs that calls the intrinsic instead.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can turn the Core.kwcall method into a intrinsic instead.

@jumerckx
Copy link
Collaborator Author

Really nice! I like the intrinisc approach a lot!

Since everything needs to be inlined fully, things like recursive calls or dynamic dispatch are unsupported.

In the original Brutus we had a simple worklist to allow for function calls.

JuliaLabs/brutus@c45ec5e/Brutus/src/codegen.jl#L70-L86

I'm implementing function calls but I'm having trouble understanding the difference between invokes and calls.
In the Brutus code, it seems like only invokes are being added to the worklist. But when I look at IRCode for some functions, I also see calls that would need to be added to the worklist.
Based on what I found online, "calls" are dynamic function calls while "invokes" are static. But why is sitofp a call instruction instead of an invoke, AFAICT, the method can be statically decided...

julia> only(Base.code_ircode(sin, Tuple{Int}))
1627 1%1 = Base.sitofp(Float64, _2)::Float64                                                                                                               │╻╷╷ float
1629 2%2 = invoke Base.Math.sin(%1::Float64)::Float64                                                                                                      │   
     └──      return %2=> Float64                                                                                                                                                  

@vchuravy
Copy link
Member

Based on what I found online, "calls" are dynamic function calls while "invokes" are static.

This is generally correct, but intrinsics and builtins are calls and not invokes. But they don't need to be Bart of the worklist since we can lower them directly to an instruction.

@jumerckx
Copy link
Collaborator Author

I implemented the worklist + function invocations:

fibonacci(n) = n < 2 ? n : fibonacci(n-1) + fibonacci(n-2)
cg(fibonacci, Tuple{i64})
module {
  func.func private @"Tuple{typeof(fibonacci), i64}"(%arg0: i64) -> i64 {
    %c2_i64 = arith.constant 2 : i64
    %0 = arith.cmpi slt, %arg0, %c2_i64 : i64
    cf.cond_br %0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    return %arg0 : i64
  ^bb2:  // pred: ^bb0
    %c1_i64 = arith.constant 1 : i64
    %1 = arith.subi %arg0, %c1_i64 : i64
    %2 = call @"Tuple{typeof(fibonacci), i64}"(%1) : (i64) -> i64
    %c2_i64_0 = arith.constant 2 : i64
    %3 = arith.subi %arg0, %c2_i64_0 : i64
    %4 = call @"Tuple{typeof(fibonacci), i64}"(%3) : (i64) -> i64
    %5 = arith.addi %2, %4 : i64
    return %5 : i64
  }
}

@vchuravy
Copy link
Member

One thing you might want to guard against is redefinition. And other names pacing issues. Right now your function name has the potential for collision since there may be multiple function f either across modules or across world (nee redefinition)

p2::Point{T}
end

@inline Base.literal_pow(::typeof(^), x::MLIRInteger, ::Val{2}) = x*x
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition didn't use to be necessary when everything was forced to be fully inlined.
I assume constant propagation doesn't travel accross function boundaries or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling aggressive constant propagation in the inference parameters fixes this (9fdffa7)

@jumerckx
Copy link
Collaborator Author

jumerckx commented Jul 16, 2024

With the worklist for function calls, I'm running into problems with captured variables in functions.
For example:

function h(x)
    f(x) + f(x+1)
end

function f(x)
    @noinline g() = x+1
    g()
end
2 1%1 = %new(var"#g#50"{Int64}, _2)::var"#g#50"{Int64}                │╻ f
  │   %2 = invoke %1()::Int64                                            ││
  │   %3 = (Core.Intrinsics.add_int)(_2, 1)::Int64                       │╻ +%4 = %new(var"#g#50"{Int64}, %3)::var"#g#50"{Int64}                │╻ f
  │   %5 = invoke %4()::Int64                                            ││
  │   %6 = (Core.Intrinsics.add_int)(%2, %5)::Int64                      │╻ +
  └──      return %6=> Int64

When generating MLIR code for h, the two invocations to #g#50 have the same Methodinstance. The two func.call operations that are generated will call the same MLIR function, leading to incorrect code.
This can be solved by generating a different MLIR func.func for each function instance, but this introduces quite some complexity.

Another approach could be to insert an extra argument x in the generated MLIR func for #g#50. At function call sites, all captured variables need to be passed explicitly.

Any more ideas on how to tackle this? I think it's important that these kind of higher-order functions work because they are useful to model more complex MLIR operations with nested regions.

@vchuravy
Copy link
Member

Any more ideas on how to tackle this?

This is why the Julia calling convention is that there is a hidden #self# argument as the first argument that is the called function.
If that argument is a "ghost type" it disappears, but it contains the closure.

src/Generate/transform.jl Outdated Show resolved Hide resolved
@vchuravy
Copy link
Member

I would recommend passing the first argument through for now and optimizing this later.

Julia does use two calling covnentions the first one is boxed values and the second one is unboxed.
The first calls the latter, and only the latter removes ghost types.

* don't generate argument values for captured values in top-level functions, these are directly passed to the code generation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants