Skip to content

Optimize evaluator perf via Control Flow Graph #1261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 20, 2024
Merged

Conversation

swernli
Copy link
Collaborator

@swernli swernli commented Mar 13, 2024

This change adds new content to FIR that captures the control flow graph for expressions. With this pre-calculated during lowering, the evaluator is refactored to follow that sequencing rather than calculating (and recalculating) the execution sequence on the fly.

In addition to the performance numbers from the CI benchmarks (which show larger gains in the Linux container), here is some data gathered from runs on my machine. This change tested in WASM shows overall performance benefit but with a trade off of reduced performance on large array literals with the latest change also improving performance on array literals:

WASI (time in µs)      
Benchmark Main PR Perf Impact
Teleport eval 162.55 129.86 -20.11%
DJ eval 9835.53 7910.8 -19.57%
Large File eval 37323 35022 -6.17%
Array Append 1835 1265.6 -31.03%
Array Update 2089.2 1556 -25.52%
Array Literal 694.17 377.02 -45.69%
Large Nested 195790 135720 -30.68%

The performance gains in native code are more significant for those benchmarks that do large numbers of iterations through code. Both the "Main" and "PR" below include the mimalloc optimization that has already merged:

Native (time in µs)      
Benchmark Main PR Perf Impact
Teleport eval 39.034 32.269 -17.33%
DJ eval 2650.8 2220.6 -16.23%
Large File eval 25327 23411 -7.57%
Array Append 255.74 152.52 -40.36%
Array Update 254.13 170.67 -32.84%
Array Literal 117.93 79.975 -32.18%
Large Nested 25544 14966 -41.41%

Finally, the performance gains for these combined changes on resource estimation workloads show promise in the larger cases, specifically:

Resource Estimation (time in sec)      
Workload v1.2 PR Perf Impact
dfchem XVIII-cas4-fb-64e-56o 45 30 -33.33%
dfchem nitrogenase-54e-54o 123 80 -34.96%
Windowed multiplication* 548 333 -39.23%

*from https://arxiv.org/abs/2402.01891

Copy link

Benchmark for 210b048

Click to view benchmark
Test Base PR %
Array append evaluation 772.8±22.35µs 356.8±7.53µs -53.83%
Array literal evaluation 517.9±42.46µs 531.1±21.69µs +2.55%
Array update evaluation 783.4±17.86µs 427.3±4.47µs -45.46%
Deutsch-Jozsa evaluation 7.1±0.06ms 5.8±0.10ms -18.31%
Large file parity evaluation 35.2±0.10ms 33.8±1.24ms -3.98%
Large input file 40.2±0.81ms 41.5±1.07ms +3.23%
Large nested iteration 74.8±2.05ms 35.4±2.03ms -52.67%
Standard library 22.9±0.53ms 24.2±0.90ms +5.68%
Teleport evaluation 96.0±2.30µs 76.3±1.37µs -20.52%

Copy link

Benchmark for 482ca5f

Click to view benchmark
Test Base PR %
Array append evaluation 756.0±7.50µs 359.4±4.36µs -52.46%
Array literal evaluation 507.2±20.92µs 538.2±21.06µs +6.11%
Array update evaluation 759.2±36.53µs 438.1±3.06µs -42.29%
Deutsch-Jozsa evaluation 7.0±0.06ms 5.8±0.06ms -17.14%
Large file parity evaluation 35.0±0.15ms 33.8±0.08ms -3.43%
Large input file 41.4±1.25ms 41.3±1.16ms -0.24%
Large nested iteration 74.5±0.41ms 35.6±1.63ms -52.21%
Standard library 23.3±0.80ms 23.5±0.78ms +0.86%
Teleport evaluation 95.4±1.78µs 76.7±1.33µs -19.60%

@swernli
Copy link
Collaborator Author

swernli commented Mar 14, 2024

Benchmark for 482ca5f

Click to view benchmark
Test Base PR %
Array append evaluation 756.0±7.50µs 359.4±4.36µs -52.46%
Array literal evaluation 507.2±20.92µs 538.2±21.06µs +6.11%
Array update evaluation 759.2±36.53µs 438.1±3.06µs -42.29%
Deutsch-Jozsa evaluation 7.0±0.06ms 5.8±0.06ms -17.14%
Large file parity evaluation 35.0±0.15ms 33.8±0.08ms -3.43%
Large input file 41.4±1.25ms 41.3±1.16ms -0.24%
Large nested iteration 74.5±0.41ms 35.6±1.63ms -52.21%
Standard library 23.3±0.80ms 23.5±0.78ms +0.86%
Teleport evaluation 95.4±1.78µs 76.7±1.33µs -19.60%

It's interesting to see that the CI benchmark benefits much more from this change (even without mimalloc yet) than my local runs, which gives me hope that the combined benefit of both changes will be even higher for older or slower machines than my local one.

Copy link

Benchmark for 65bf8e1

Click to view benchmark
Test Base PR %
Array append evaluation 868.1±3.99µs 335.0±3.28µs -61.41%
Array literal evaluation 348.2±4.27µs 309.3±6.28µs -11.17%
Array update evaluation 856.0±15.45µs 423.7±2.99µs -50.50%
Deutsch-Jozsa evaluation 6.7±0.04ms 5.1±0.05ms -23.88%
Large file parity evaluation 35.7±0.18ms 33.8±0.40ms -5.32%
Large input file 31.2±0.99ms 31.5±1.05ms +0.96%
Large nested iteration 88.9±1.46ms 33.0±0.30ms -62.88%
Standard library 16.9±0.32ms 17.0±0.46ms +0.59%
Teleport evaluation 99.4±1.70µs 70.6±1.04µs -28.97%

@swernli swernli force-pushed the swernli/lower-control-flow branch from cf9f212 to c34f677 Compare March 17, 2024 15:12
Copy link

Benchmark for a524ee6

Click to view benchmark
Test Base PR %
Array append evaluation 742.0±14.28µs 365.6±3.69µs -50.73%
Array literal evaluation 297.2±5.78µs 195.7±5.28µs -34.15%
Array update evaluation 750.4±3.87µs 461.9±5.77µs -38.45%
Deutsch-Jozsa evaluation 6.2±0.06ms 5.1±0.04ms -17.74%
Large file parity evaluation 35.1±0.33ms 33.7±0.11ms -3.99%
Large input file 33.1±1.69ms 34.1±1.19ms +3.02%
Large nested iteration 74.6±0.14ms 35.7±0.23ms -52.14%
Standard library 18.0±0.91ms 18.1±0.95ms +0.56%
Teleport evaluation 91.0±1.71µs 71.0±7.22µs -21.98%

Copy link

Benchmark for 8cb4a94

Click to view benchmark
Test Base PR %
Array append evaluation 755.4±4.97µs 338.6±4.38µs -55.18%
Array literal evaluation 437.4±5.83µs 174.0±2.63µs -60.22%
Array update evaluation 759.8±4.36µs 424.7±4.23µs -44.10%
Deutsch-Jozsa evaluation 6.2±0.05ms 5.0±0.04ms -19.35%
Large file parity evaluation 35.3±0.12ms 33.6±0.07ms -4.82%
Large input file 39.8±2.17ms 34.5±0.91ms -13.32%
Large nested iteration 78.5±1.05ms 33.4±0.90ms -57.45%
Standard library 22.5±1.43ms 18.8±0.80ms -16.44%
Teleport evaluation 91.7±1.98µs 70.4±1.70µs -23.23%

@swernli swernli marked this pull request as ready for review March 18, 2024 16:51
Copy link

Benchmark for fe9494d

Click to view benchmark
Test Base PR %
Array append evaluation 777.7±3.09µs 343.6±2.89µs -55.82%
Array literal evaluation 327.7±3.29µs 174.5±6.76µs -46.75%
Array update evaluation 752.2±3.57µs 428.0±1.36µs -43.10%
Deutsch-Jozsa evaluation 6.2±0.03ms 5.0±0.10ms -19.35%
Large file parity evaluation 35.2±0.53ms 33.6±0.14ms -4.55%
Large input file 34.4±1.89ms 33.7±1.50ms -2.03%
Large nested iteration 77.6±0.94ms 33.5±0.37ms -56.83%
Standard library 17.3±0.58ms 17.3±0.34ms 0.00%
Teleport evaluation 91.0±2.57µs 70.4±2.61µs -22.64%

Copy link

Benchmark for c271532

Click to view benchmark
Test Base PR %
Array append evaluation 751.6±8.73µs 345.2±3.34µs -54.07%
Array literal evaluation 310.3±3.28µs 180.1±3.45µs -41.96%
Array update evaluation 761.0±11.89µs 431.8±3.49µs -43.26%
Deutsch-Jozsa evaluation 6.2±0.13ms 5.2±0.11ms -16.13%
Large file parity evaluation 35.3±0.11ms 33.6±0.07ms -4.82%
Large input file 34.6±0.85ms 35.1±0.82ms +1.45%
Large nested iteration 78.5±3.11ms 34.1±1.18ms -56.56%
Standard library 19.1±0.54ms 19.8±1.13ms +3.66%
Teleport evaluation 91.1±1.52µs 72.6±1.30µs -20.31%

Copy link
Member

@minestarks minestarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I couldn't follow every single line but changes look reasonable and it seems pretty thoroughly covered by tests. Thanks for the perf gains!

@swernli swernli requested a review from sezna as a code owner March 19, 2024 18:23
Copy link

Benchmark for c43e15e

Click to view benchmark
Test Base PR %
Array append evaluation 777.6±8.82µs 338.6±1.93µs -56.46%
Array literal evaluation 298.9±3.98µs 173.3±2.48µs -42.02%
Array update evaluation 784.9±8.78µs 423.2±1.84µs -46.08%
Deutsch-Jozsa evaluation 6.3±0.03ms 5.1±0.08ms -19.05%
Large file parity evaluation 35.0±0.11ms 33.5±0.10ms -4.29%
Large input file 29.6±1.04ms 29.5±0.96ms -0.34%
Large nested iteration 78.8±1.40ms 33.5±0.18ms -57.49%
Standard library 16.5±0.26ms 16.5±0.27ms 0.00%
Teleport evaluation 90.9±1.56µs 70.8±2.18µs -22.11%

Copy link

Benchmark for 2832700

Click to view benchmark
Test Base PR %
Array append evaluation 790.4±10.30µs 342.7±10.26µs -56.64%
Array literal evaluation 312.6±2.17µs 174.6±9.21µs -44.15%
Array update evaluation 793.0±5.41µs 426.0±2.76µs -46.28%
Deutsch-Jozsa evaluation 6.3±0.04ms 5.1±0.05ms -19.05%
Large file parity evaluation 35.1±0.10ms 33.6±0.89ms -4.27%
Large input file 32.0±1.51ms 32.6±1.55ms +1.88%
Large nested iteration 80.9±1.06ms 33.5±0.22ms -58.59%
Standard library 17.9±1.28ms 17.2±0.69ms -3.91%
Teleport evaluation 91.7±1.91µs 71.0±1.63µs -22.57%

Copy link

Benchmark for a31e341

Click to view benchmark
Test Base PR %
Array append evaluation 794.7±14.42µs 326.4±3.82µs -58.93%
Array literal evaluation 333.2±2.93µs 192.7±1.67µs -42.17%
Array update evaluation 796.3±25.29µs 412.3±2.04µs -48.22%
Deutsch-Jozsa evaluation 6.4±0.06ms 5.1±0.08ms -20.31%
Large file parity evaluation 35.1±0.11ms 33.6±0.35ms -4.27%
Large input file 29.7±0.95ms 29.9±1.02ms +0.67%
Large nested iteration 80.7±1.08ms 32.2±0.63ms -60.10%
Standard library 16.6±0.40ms 16.6±0.46ms 0.00%
Teleport evaluation 93.1±1.54µs 70.1±1.30µs -24.70%

Copy link
Contributor

@cesarzc cesarzc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Thanks for walking me through the changes yesterday, it was really helpful and made reviewing this PR much easier.

Co-authored-by: César Zaragoza Cortés <[email protected]>
Copy link

Benchmark for 12a9bdb

Click to view benchmark
Test Base PR %
Array append evaluation 780.6±5.21µs 342.6±1.76µs -56.11%
Array literal evaluation 295.2±2.57µs 180.6±4.31µs -38.82%
Array update evaluation 784.1±5.76µs 428.4±2.67µs -45.36%
Deutsch-Jozsa evaluation 6.3±0.04ms 5.1±0.22ms -19.05%
Large file parity evaluation 35.1±1.03ms 33.7±0.68ms -3.99%
Large input file 34.6±1.02ms 33.9±0.90ms -2.02%
Large nested iteration 80.4±1.02ms 33.6±0.15ms -58.21%
Standard library 18.5±0.83ms 18.4±0.92ms -0.54%
Teleport evaluation 91.4±1.98µs 70.0±1.56µs -23.41%

@swernli swernli added this pull request to the merge queue Mar 20, 2024
Merged via the queue into main with commit 319d643 Mar 20, 2024
@swernli swernli deleted the swernli/lower-control-flow branch March 20, 2024 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants