-
Notifications
You must be signed in to change notification settings - Fork 124
Optimize evaluator perf via Control Flow Graph #1261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Benchmark for 210b048Click to view benchmark
|
Benchmark for 482ca5fClick to view benchmark
|
It's interesting to see that the CI benchmark benefits much more from this change (even without mimalloc yet) than my local runs, which gives me hope that the combined benefit of both changes will be even higher for older or slower machines than my local one. |
Benchmark for 65bf8e1Click to view benchmark
|
cf9f212
to
c34f677
Compare
Benchmark for a524ee6Click to view benchmark
|
Benchmark for 8cb4a94Click to view benchmark
|
Benchmark for fe9494dClick to view benchmark
|
Benchmark for c271532Click to view benchmark
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course I couldn't follow every single line but changes look reasonable and it seems pretty thoroughly covered by tests. Thanks for the perf gains!
Benchmark for c43e15eClick to view benchmark
|
Benchmark for 2832700Click to view benchmark
|
Benchmark for a31e341Click to view benchmark
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Thanks for walking me through the changes yesterday, it was really helpful and made reviewing this PR much easier.
Co-authored-by: César Zaragoza Cortés <[email protected]>
Benchmark for 12a9bdbClick to view benchmark
|
This change adds new content to FIR that captures the control flow graph for expressions. With this pre-calculated during lowering, the evaluator is refactored to follow that sequencing rather than calculating (and recalculating) the execution sequence on the fly.
In addition to the performance numbers from the CI benchmarks (which show larger gains in the Linux container), here is some data gathered from runs on my machine. This change tested in WASM shows overall performance benefit but
with a trade off of reduced performance on large array literalswith the latest change also improving performance on array literals:The performance gains in native code are more significant for those benchmarks that do large numbers of iterations through code. Both the "Main" and "PR" below include the mimalloc optimization that has already merged:
Finally, the performance gains for these combined changes on resource estimation workloads show promise in the larger cases, specifically:
*from https://arxiv.org/abs/2402.01891