Skip to content

Expose experimental LLVM features for GPU offloading #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 4 tasks
nikomatsakis opened this issue Jul 22, 2024 · 12 comments
Open
2 of 4 tasks

Expose experimental LLVM features for GPU offloading #109

nikomatsakis opened this issue Jul 22, 2024 · 12 comments

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jul 22, 2024

Metadata
Point of contact @ZuseZ4
Team(s) compiler, lang
Goal document 2025h1/GPU-Offload

Summary

Expose experimental LLVM features for GPU offloading and allow combining it with the std::autodiff feature.

Tasks and status

@nikomatsakis nikomatsakis added this to the 2024h2 milestone Jul 22, 2024
@rust-lang rust-lang locked and limited conversation to collaborators Jul 25, 2024
@nikomatsakis
Copy link
Contributor Author

This issue is intended for status updates only.

For general questions or comments, please contact the owner(s) directly.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Aug 24, 2024

During the first month, I focused on automatic differentiation. I cleaned up my rustc fork and made my first two upstreaming PRs for the frontend and backend. Once they are merged, I will continue with posting PRs for the remaining middle-end. While waiting for reviews, I have been improving the docs a bit, mainly the pages about debugging Enzyme crashes. I am especially proud that due to those docs we recently got our first enzyme core issue with a full LLVM-IR reproducer from a Rust dev, even though the developer reporting that issue had no previous compiler/LLVM experience. Such detailed issues make fixing bugs for Enzyme core much easier.

On the GPU side, I mainly have to thank nikic, who reliably updates the LLVM backend of Rustc every few weeks or months. Thanks to his latest update rustc now supports a sufficiently new LLVM which ships most of the GPU/Offloading work that I want to expose on the Rust side. Once my first two Autodiff patches have settled, I'll look a bit more into setting up documentation for the GPU feature.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Sep 12, 2024

During the last three weeks, my first autodiff PR for the backend, which includes the enzyme submodule and 13 additional files, got merged! I also got a ton of feedback from reviewers, especially for my frontend PR (thanks to jieyouxu). Now that the backend is merged, I did put up my third PR, covering the changes I made to rustc_codegen_llvm. I am currently at RustConf, so I won't be able to address much of the feedback this week, but I am happy to talk to everyone also visiting and will try to get both PRs ready to merge in the next week.
Once we then have the two open PRs merged, we should have my changes to ~55/85 files upstream, so we're making good progress.

On the GPU side again not many updates due to my current autodiff focus, but thanks to another llvm submodule update we can now use some nicer APIs for our development in rustc, which recently got merged into LLVM.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Sep 16, 2024

And as another short update, my talk "When unsafe code is slow - Automatic Differentiation in Rust" got accepted as a techtalk for the LLVM dev mtg. There I'll present a lot of benchmarks and some analysis comparing Rust-Enzyme with the C++ frontend of Enzyme, and show one application which we had to port from Python/JAX to Rust/Enzyme.
The full program of the dev meeting is available here.
For that, I spent some time trying to fix the benchmark infrastructure in Enzyme core, to make sure everyone can reproduce our benchmarks.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Sep 30, 2024

Thanks to some support from the bootstrap team, dist builds with autodiff support enabled now work.
That allowed us to add Rust to our autodiff fork of the compiler explorer: https://enzyme.mit.edu/explorer/
Unfortunately, we still have some dist issues about finding std in the compiler explorer build, so help here would be appreciated.
Other than that, this morning my PR to add Enzyme/autodiff support to the test infra got merged: rust-lang/rust#131044
This should allow to add this larger frontend PR to the merge queue later today: rust-lang/rust#129458

@ZuseZ4
Copy link
Member

ZuseZ4 commented Oct 22, 2024

I've been travelling a lot for the last two weeks, but hope to be able to get back to work next Monday. Since the last update we got:

  1. The Autodiff frontend got merged! This included over 2k LoC and 30 files, so the remaining diff is now much smaller.
  2. The Autodiff middle-end as the last missing AD piece is probably getting a re-design. Right now we use Enzyme as a library, which means that we must write ffi wrappers around Enzyme's C/C++ functions and have to differentiate functions one by one. If we switch over to an LLVM pass-based approach instead we can drop a lot of glue code (simplifying the review process) and can get us some features for free, which the pass already handles for us (i.e. differentiate higher-order derivatives in the right order). Julia also just moved over from the library to the pass-based approach. C/C++ always used the pass-based approach which in the past had a few limitations that recently got fixed. Finally, a pass-based approach has reproducibility improvements, since now all information will be in the llvm-ir. As summary, this seems like a good moment to also move Rust over.
  3. I opened a tracking issue for the GPU offload feature and made the first PR to enable LLVM's offload feature.
  4. I started working with some Enzyme and Bootstrap contributors to get a compiler explorer instance with Rust-AD to work.
  5. I am giving one tech talk and two workshop talks at the LLVM Dev Conference, I will share the slides (and videos if possible) afterwards. The three talks are about ML in Rust, GPU-Programming in Rust, and the performance benefits of safe over unsafe code.

Help Wanted:
I would appreciate if someone could look into fixing our Rust dist build used in the Enzyme Compiler explorer. I have spend quite a few hours trying different configurations, but have been unable to get rid of the error

error[E0463]: can't find crate for `std`

Any help would be appreciated, I can share more information if someone has time to investigate further.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Nov 27, 2024

  1. The re-design of our autodiff middle/backend which I described in the last update was implemented. This led to a reduction from 2.5k to 1.1k LoC remaining to be upstreamed. I split the code up in two PRs (Autodiff Upstreaming - rustc_codegen_ssa, rustc_middle rust#133429 and Autodiff Upstreaming - rustc_codegen_llvm changes rust#130060). Both are now small enough to be reviewed, and got their first round of feedback, so they will hopefully land at the beginning of December. Afterwards, everything needed to run autodiff will be available on nightly (at least as MVP), so we can discuss to build and ship it by default.

  2. The talks which I gave at LLVM Dev caused some interesting follow-up discussions. Most companies still use Rust "only" for classical SWE, but given that it's getting more and more common I also see more interest outside of Academia in using it for (scientific) computing, ML, HPC, etc, which I find exciting. I also got some offers from people in industry to help with the GPU work.

  3. The preprint of the first paper making use of std::autodiff is available on Arxive! https://arxiv.org/abs/2411.17011v1
    The code is also available here: https://github.com/ChemAI-Lab/molpipx/. It includes both Python/JAX and Rust implementations, because JAX Jitting times are unbearably slow here. In certain configuration it takes more than a day to JIT, but only 30 minutes to compile in Rust.

  4. Once autodiff is upstreamed (especially including some small follow-up PRs which are needed to achieve the best performance), I will also publish some very promising runtime results that we have on a larger set of benchmarks.

  5. Last month I asked for help with our compiler explorer, and I'm happy that we indeed got the needed support since, thank you! fix rustc installation EnzymeAD/enzyme-explorer#15 Our compiler explorer for Rust with std::autodiff support is now available under https://enzyme.mit.edu/explorer/ (just select Rust).

@ZuseZ4
Copy link
Member

ZuseZ4 commented Jan 3, 2025

Happy New Year everyone! After a few more rounds of feedback, the next autodiff PR recently got merged: rust-lang/rust#130060
With that, I only have one last PR open to have a fully working autodiff MVP upstream. A few features had to be removed during upstreaming to simplify the reviewing process, but they should be easier to bring back as single PRs.

Beginning next week, I will also work on an MVP for the batching feature of LLVM/Enzyme, which enables some AoS and SoA vectorization. It mostly re-uses the existing autodiff infrastructure, so I expect the PRs for it to be much smaller.

On the GPU side, there has been a recent push by another developer to add a new AMD GPU target to the Rust compiler. This is something that I would have needed for the llvm offload project anyway, so I'm very happy to see movement here: rust-lang/compiler-team#823

@nikomatsakis nikomatsakis modified the milestones: 2024h2, 2025h1 Feb 18, 2025
@nikomatsakis
Copy link
Contributor Author

This is a continuing project goal, and the updates below this comment will be for the new period 2025h1

@nikomatsakis nikomatsakis moved this to Project goals in Lang team features Feb 21, 2025
@nikomatsakis nikomatsakis changed the title Expose experimental LLVM features for automatic differentiation and GPU offloading Expose experimental LLVM features for GPU offloading Feb 26, 2025
@nikomatsakis nikomatsakis moved this to Project goal in Lang team features Mar 4, 2025
@ZuseZ4
Copy link
Member

ZuseZ4 commented Mar 25, 2025

I just noticed that I missed my February update, so I'll keep this update a bit more high-level, to not make it too long.

Key developments:

  1. All key autodiff PRs got merged. So after building rust-lang/rust with the autodiff feature enabled, users can now use it, without the need for any custom fork.
  2. std::autodiff received the first PRs from new contributors, which have not been previously involved in rustc development! My plan is to grow a team to maintain this feature, so that's a great start. The PRs are here, here and here. Over time I hope to hand over increasingly larger issues.
  3. I received an offer to join the Rust compiler team, so now I can also officially review and approve PRs! For now I'll focus on reviewing PRs in the fields I'm most comfortable with, so autodiff, batching, and soon GPU offload.
  4. I implemented a standalone batching feature. It was a bit larger (~2k LoC) and needed some (back then unmerged) autodiff PRs, since they both use the same underlying Enzyme infrastructure. I therefore did not push for merging it.
  5. I recently implemented batching as part of the autodiff macro, for people who want to use both together. I subsequently split out a first set of code improvements and refactorings, which already got merged. The remaining autodiff feature PR is only 600 loc, so I'm currently cleaning it up for review.
  6. I spend time preparing an MCP to enable autodiff in CI (and therefore nightly). I also spend a lot of time discussing a potential MLIR backend for rustc. Please reach out if you want to be involved!

**Help wanted: **
We want to support autodiff in lib builds, instead of only binaries. oli-obk and I recently figured out the underlying bug, and I started with a PR in rust-lang/rust#137570. The problem is that autodiff assumes fat-lto builds, but lib builds compile some of the library code using thin-lto, even if users specify lto=fat in their Cargo.toml. We'd want to move every thing to fat-lto if we enable Autodiff as a temporary solution, and later move towards embed-bc as a longer-term solution. If you have some time to help please reach out! Some of us have already looked into it a little but got side-tracked, so it's better to talk first about which code to re-use, rather than starting from scratch.

I also booked my RustWeek ticket, so I'm happy to talk about all types of Scientific Computing, HPC, ML, or cursed Rust(c) and LLVM internals! Please feel free to dm me if you're also going and want to meet.

@ZuseZ4
Copy link
Member

ZuseZ4 commented May 25, 2025

And another round of updates. First of all, Google approved two GSoC projects for the summer, where @Sa4dUs will work on the autodiff frontend and @KMJ-007 will work on the backend. The frontend project is about improving our ABI handling to remove corner-cases around specific types that we currently can not differentiate. If time permits he might also get to re-model our frontend to lower our autodiff macro to a proper rustc intrinsic, which should allow us to simplify our logic a lot.
The backend project will look at how Enzyme uses TypeTrees, and create those during the lowering to LLVM-IR. This should allow autodiff to become more reliable, work on debug builds, and generally compile a lot faster.

@ZuseZ4
Copy link
Member

ZuseZ4 commented May 25, 2025

The last weeks were focused on enabling autodiff in a lot more locations, as well as doing a lot of CI and Cmake work to be able to ship it on nightly. At the same time, autodiff is also gaining increasingly more contributors. That should help a lot with the uptick in issues, which I expect once we enable autodiff in nightly builds.

Key developments:

  1. @Shourya742 added support for applying autodiff inside of inherent impl blocks. Fix auto diff failing on inherent impl blocks rust#140104
  2. @haenoe added support for applying autodiff to generic functions. fix autodiff macro on generic functions rust#140049
  3. @Shourya742 added an optimization to inline the generated function, removing one layer of indirection. That should improve performance when differentiating tiny functions. add autodiff inline rust#139308
  4. @haenoe added support for applying autodiff to inner (nested) functions. fix usage of autodiff macro with inner functions rust#138314
  5. I have found a bugfix for building rustc with both debug and autodiff enabled. This previously failed during bootstrap. This bugfix also solved the last remaining (compile time) performance regression of the autodiff feature. That means that if we now enable autodiff on nightly, it won't affect compile times for people not using it. Fix autodiff debug builds rust#140030
  6. After a hint from Onur I also fixed autodiff check builds:skip llvm-config in autodiff check builds, when its unavailable rust#140000, which makes contributing to autodiff easier.
  7. I ran countless experiments on improving and fixing Enzyme's CMake and merged a few PRs into Enzyme. We don't fully support the macos dist runners yet and some of my CMake improvements only live in our Enzyme fork and aren't accepted by upstream yet, but the CI is now able to run longer before failing with the next bug, which should hopefully be easy to fix. At least I already received a hint on how to solve it.
  8. @Shourya742 also helped with an experiment on how to bundle Enzyme with the Rust compiler. We ended up selecting a different distribution path, but the PR was helpful to discuss solutions with Infra contributors. Add enzyme distribution step rust#140244
  9. @Sa4dUs implemented a PR to split our #[autodiff] macro into autodiff_forward and autodiff_reverse. They behave quite differently in some ways that might surprise users, so I decided it's best for now to have them separated, which also will make teaching and documenting easier. Split autodiff into autodiff_forward and autodiff_reverse rust#140697

Help Wanted:
There are two or three smaller issues remaining to distribute Enzyme/autodiff. If anyone is open to help, either with bootstrap, CI, or CMake issues, I'd appreciate any support. Please feel free to ping me on Discord, Zulip, or in rust-lang/rust#140064 to discuss what's left to do.

In general, we solved most of the distribution issues over the last weeks, and autodiff can now be applied to almost all functions. That's a pretty good base, so I will now start to look again more into the GPU support for rustc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Project goal
Development

No branches or pull requests

3 participants