Optimize evaluator perf via Control Flow Graph #1261

swernli · 2024-03-13T18:59:05Z

This change adds new content to FIR that captures the control flow graph for expressions. With this pre-calculated during lowering, the evaluator is refactored to follow that sequencing rather than calculating (and recalculating) the execution sequence on the fly.

In addition to the performance numbers from the CI benchmarks (which show larger gains in the Linux container), here is some data gathered from runs on my machine. This change tested in WASM shows overall performance benefit but ~~with a trade off of reduced performance on large array literals~~ with the latest change also improving performance on array literals:

WASI (time in µs)
Benchmark	Main	PR	Perf Impact
Teleport eval	162.55	129.86	-20.11%
DJ eval	9835.53	7910.8	-19.57%
Large File eval	37323	35022	-6.17%
Array Append	1835	1265.6	-31.03%
Array Update	2089.2	1556	-25.52%
Array Literal	694.17	377.02	-45.69%
Large Nested	195790	135720	-30.68%

The performance gains in native code are more significant for those benchmarks that do large numbers of iterations through code. Both the "Main" and "PR" below include the mimalloc optimization that has already merged:

Native (time in µs)
Benchmark	Main	PR	Perf Impact
Teleport eval	39.034	32.269	-17.33%
DJ eval	2650.8	2220.6	-16.23%
Large File eval	25327	23411	-7.57%
Array Append	255.74	152.52	-40.36%
Array Update	254.13	170.67	-32.84%
Array Literal	117.93	79.975	-32.18%
Large Nested	25544	14966	-41.41%

Finally, the performance gains for these combined changes on resource estimation workloads show promise in the larger cases, specifically:

Resource Estimation (time in sec)
Workload	v1.2	PR	Perf Impact
dfchem XVIII-cas4-fb-64e-56o	45	30	-33.33%
dfchem nitrogenase-54e-54o	123	80	-34.96%
Windowed multiplication*	548	333	-39.23%

*from https://arxiv.org/abs/2402.01891

github-actions · 2024-03-13T19:07:48Z

Benchmark for `210b048`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	772.8±22.35µs	356.8±7.53µs	-53.83%
Array literal evaluation	517.9±42.46µs	531.1±21.69µs	+2.55%
Array update evaluation	783.4±17.86µs	427.3±4.47µs	-45.46%
Deutsch-Jozsa evaluation	7.1±0.06ms	5.8±0.10ms	-18.31%
Large file parity evaluation	35.2±0.10ms	33.8±1.24ms	-3.98%
Large input file	40.2±0.81ms	41.5±1.07ms	+3.23%
Large nested iteration	74.8±2.05ms	35.4±2.03ms	-52.67%
Standard library	22.9±0.53ms	24.2±0.90ms	+5.68%
Teleport evaluation	96.0±2.30µs	76.3±1.37µs	-20.52%

github-actions · 2024-03-13T22:08:45Z

Benchmark for `482ca5f`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	756.0±7.50µs	359.4±4.36µs	-52.46%
Array literal evaluation	507.2±20.92µs	538.2±21.06µs	+6.11%
Array update evaluation	759.2±36.53µs	438.1±3.06µs	-42.29%
Deutsch-Jozsa evaluation	7.0±0.06ms	5.8±0.06ms	-17.14%
Large file parity evaluation	35.0±0.15ms	33.8±0.08ms	-3.43%
Large input file	41.4±1.25ms	41.3±1.16ms	-0.24%
Large nested iteration	74.5±0.41ms	35.6±1.63ms	-52.21%
Standard library	23.3±0.80ms	23.5±0.78ms	+0.86%
Teleport evaluation	95.4±1.78µs	76.7±1.33µs	-19.60%

swernli · 2024-03-14T05:50:13Z

Benchmark for 482ca5f

Click to view benchmark
Test Base PR %
Array append evaluation 756.0±7.50µs 359.4±4.36µs -52.46%
Array literal evaluation 507.2±20.92µs 538.2±21.06µs +6.11%
Array update evaluation 759.2±36.53µs 438.1±3.06µs -42.29%
Deutsch-Jozsa evaluation 7.0±0.06ms 5.8±0.06ms -17.14%
Large file parity evaluation 35.0±0.15ms 33.8±0.08ms -3.43%
Large input file 41.4±1.25ms 41.3±1.16ms -0.24%
Large nested iteration 74.5±0.41ms 35.6±1.63ms -52.21%
Standard library 23.3±0.80ms 23.5±0.78ms +0.86%
Teleport evaluation 95.4±1.78µs 76.7±1.33µs -19.60%

It's interesting to see that the CI benchmark benefits much more from this change (even without mimalloc yet) than my local runs, which gives me hope that the combined benefit of both changes will be even higher for older or slower machines than my local one.

github-actions · 2024-03-15T17:24:33Z

Benchmark for `65bf8e1`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	868.1±3.99µs	335.0±3.28µs	-61.41%
Array literal evaluation	348.2±4.27µs	309.3±6.28µs	-11.17%
Array update evaluation	856.0±15.45µs	423.7±2.99µs	-50.50%
Deutsch-Jozsa evaluation	6.7±0.04ms	5.1±0.05ms	-23.88%
Large file parity evaluation	35.7±0.18ms	33.8±0.40ms	-5.32%
Large input file	31.2±0.99ms	31.5±1.05ms	+0.96%
Large nested iteration	88.9±1.46ms	33.0±0.30ms	-62.88%
Standard library	16.9±0.32ms	17.0±0.46ms	+0.59%
Teleport evaluation	99.4±1.70µs	70.6±1.04µs	-28.97%

github-actions · 2024-03-17T15:21:50Z

Benchmark for `a524ee6`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	742.0±14.28µs	365.6±3.69µs	-50.73%
Array literal evaluation	297.2±5.78µs	195.7±5.28µs	-34.15%
Array update evaluation	750.4±3.87µs	461.9±5.77µs	-38.45%
Deutsch-Jozsa evaluation	6.2±0.06ms	5.1±0.04ms	-17.74%
Large file parity evaluation	35.1±0.33ms	33.7±0.11ms	-3.99%
Large input file	33.1±1.69ms	34.1±1.19ms	+3.02%
Large nested iteration	74.6±0.14ms	35.7±0.23ms	-52.14%
Standard library	18.0±0.91ms	18.1±0.95ms	+0.56%
Teleport evaluation	91.0±1.71µs	71.0±7.22µs	-21.98%

github-actions · 2024-03-18T16:27:21Z

Benchmark for `8cb4a94`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	755.4±4.97µs	338.6±4.38µs	-55.18%
Array literal evaluation	437.4±5.83µs	174.0±2.63µs	-60.22%
Array update evaluation	759.8±4.36µs	424.7±4.23µs	-44.10%
Deutsch-Jozsa evaluation	6.2±0.05ms	5.0±0.04ms	-19.35%
Large file parity evaluation	35.3±0.12ms	33.6±0.07ms	-4.82%
Large input file	39.8±2.17ms	34.5±0.91ms	-13.32%
Large nested iteration	78.5±1.05ms	33.4±0.90ms	-57.45%
Standard library	22.5±1.43ms	18.8±0.80ms	-16.44%
Teleport evaluation	91.7±1.98µs	70.4±1.70µs	-23.23%

github-actions · 2024-03-18T20:29:35Z

Benchmark for `fe9494d`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	777.7±3.09µs	343.6±2.89µs	-55.82%
Array literal evaluation	327.7±3.29µs	174.5±6.76µs	-46.75%
Array update evaluation	752.2±3.57µs	428.0±1.36µs	-43.10%
Deutsch-Jozsa evaluation	6.2±0.03ms	5.0±0.10ms	-19.35%
Large file parity evaluation	35.2±0.53ms	33.6±0.14ms	-4.55%
Large input file	34.4±1.89ms	33.7±1.50ms	-2.03%
Large nested iteration	77.6±0.94ms	33.5±0.37ms	-56.83%
Standard library	17.3±0.58ms	17.3±0.34ms	0.00%
Teleport evaluation	91.0±2.57µs	70.4±2.61µs	-22.64%

github-actions · 2024-03-18T22:17:04Z

Benchmark for `c271532`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	751.6±8.73µs	345.2±3.34µs	-54.07%
Array literal evaluation	310.3±3.28µs	180.1±3.45µs	-41.96%
Array update evaluation	761.0±11.89µs	431.8±3.49µs	-43.26%
Deutsch-Jozsa evaluation	6.2±0.13ms	5.2±0.11ms	-16.13%
Large file parity evaluation	35.3±0.11ms	33.6±0.07ms	-4.82%
Large input file	34.6±0.85ms	35.1±0.82ms	+1.45%
Large nested iteration	78.5±3.11ms	34.1±1.18ms	-56.56%
Standard library	19.1±0.54ms	19.8±1.13ms	+3.66%
Teleport evaluation	91.1±1.52µs	72.6±1.30µs	-20.31%

allocator/Cargo.toml

compiler/qsc_eval/src/val.rs

compiler/qsc_eval/src/lib.rs

minestarks

Of course I couldn't follow every single line but changes look reasonable and it seems pretty thoroughly covered by tests. Thanks for the perf gains!

…-flow

github-actions · 2024-03-19T18:32:36Z

Benchmark for `c43e15e`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	777.6±8.82µs	338.6±1.93µs	-56.46%
Array literal evaluation	298.9±3.98µs	173.3±2.48µs	-42.02%
Array update evaluation	784.9±8.78µs	423.2±1.84µs	-46.08%
Deutsch-Jozsa evaluation	6.3±0.03ms	5.1±0.08ms	-19.05%
Large file parity evaluation	35.0±0.11ms	33.5±0.10ms	-4.29%
Large input file	29.6±1.04ms	29.5±0.96ms	-0.34%
Large nested iteration	78.8±1.40ms	33.5±0.18ms	-57.49%
Standard library	16.5±0.26ms	16.5±0.27ms	0.00%
Teleport evaluation	90.9±1.56µs	70.8±2.18µs	-22.11%

github-actions · 2024-03-19T18:44:04Z

Benchmark for `2832700`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	790.4±10.30µs	342.7±10.26µs	-56.64%
Array literal evaluation	312.6±2.17µs	174.6±9.21µs	-44.15%
Array update evaluation	793.0±5.41µs	426.0±2.76µs	-46.28%
Deutsch-Jozsa evaluation	6.3±0.04ms	5.1±0.05ms	-19.05%
Large file parity evaluation	35.1±0.10ms	33.6±0.89ms	-4.27%
Large input file	32.0±1.51ms	32.6±1.55ms	+1.88%
Large nested iteration	80.9±1.06ms	33.5±0.22ms	-58.59%
Standard library	17.9±1.28ms	17.2±0.69ms	-3.91%
Teleport evaluation	91.7±1.91µs	71.0±1.63µs	-22.57%

compiler/qsc_eval/src/lib.rs

github-actions · 2024-03-19T21:08:37Z

Benchmark for `a31e341`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	794.7±14.42µs	326.4±3.82µs	-58.93%
Array literal evaluation	333.2±2.93µs	192.7±1.67µs	-42.17%
Array update evaluation	796.3±25.29µs	412.3±2.04µs	-48.22%
Deutsch-Jozsa evaluation	6.4±0.06ms	5.1±0.08ms	-20.31%
Large file parity evaluation	35.1±0.11ms	33.6±0.35ms	-4.27%
Large input file	29.7±0.95ms	29.9±1.02ms	+0.67%
Large nested iteration	80.7±1.08ms	32.2±0.63ms	-60.10%
Standard library	16.6±0.40ms	16.6±0.46ms	0.00%
Teleport evaluation	93.1±1.54µs	70.1±1.30µs	-24.70%

compiler/qsc_eval/src/lib.rs

compiler/qsc_fir/src/fir.rs

cesarzc

Looks good!

Thanks for walking me through the changes yesterday, it was really helpful and made reviewing this PR much easier.

Co-authored-by: César Zaragoza Cortés <[email protected]>

github-actions · 2024-03-19T23:02:14Z

Benchmark for `12a9bdb`

Click to view benchmark

Test	Base	PR	%
Array append evaluation	780.6±5.21µs	342.6±1.76µs	-56.11%
Array literal evaluation	295.2±2.57µs	180.6±4.31µs	-38.82%
Array update evaluation	784.1±5.76µs	428.4±2.67µs	-45.36%
Deutsch-Jozsa evaluation	6.3±0.04ms	5.1±0.22ms	-19.05%
Large file parity evaluation	35.1±1.03ms	33.7±0.68ms	-3.99%
Large input file	34.6±1.02ms	33.9±0.90ms	-2.02%
Large nested iteration	80.4±1.02ms	33.6±0.15ms	-58.21%
Standard library	18.5±0.83ms	18.4±0.92ms	-0.54%
Teleport evaluation	91.4±1.98µs	70.0±1.56µs	-23.41%

minestarks mentioned this pull request Mar 14, 2024

Native libraries use mimalloc as global allocator #1249

Merged

swernli added 7 commits March 17, 2024 08:10

Adjust mimalloc to allow wasi benchmarks

2597e6c

Reduce size of Value enum with boxing

12b5078

Include control flow graph in FIR for evaluator

c8c62a9

Change value stack patterns to avoid pushing unused unit values

a75e5a2

Fix spans in debugger tests

da0a598

Support partial evaluation of sub graphs, add tests

8550af9

Optimize array of literals

c34f677

swernli force-pushed the swernli/lower-control-flow branch from cf9f212 to c34f677 Compare March 17, 2024 15:12

Use utility function

309335b

Update names

e24338d

swernli marked this pull request as ready for review March 18, 2024 16:51

swernli requested review from billti, cesarzc, idavis, minestarks and DmitryVasilevsky as code owners March 18, 2024 16:51

Rename EvalRange to ValueRange

dd169e0

Update frame leave logic

db0a6f9

minestarks reviewed Mar 18, 2024

View reviewed changes

allocator/Cargo.toml Outdated Show resolved Hide resolved

minestarks reviewed Mar 19, 2024

View reviewed changes

compiler/qsc_eval/src/val.rs Show resolved Hide resolved

minestarks reviewed Mar 19, 2024

View reviewed changes

compiler/qsc_eval/src/lib.rs Show resolved Hide resolved

minestarks reviewed Mar 19, 2024

View reviewed changes

compiler/qsc_eval/src/lib.rs Show resolved Hide resolved

minestarks approved these changes Mar 19, 2024

View reviewed changes

PR feedback

245f17d

swernli requested a review from sezna as a code owner March 19, 2024 18:23

Merge remote-tracking branch 'origin/main' into swernli/lower-control…

ed1cae9

…-flow

Fix FIR local idents

151e8a1