Constant stack size #2688

timcassell · 2025-01-16T05:30:06Z

Refactored engine stages to make the stack size constant for each benchmark invocation.
Applied AggressiveOptimization to engine methods to eliminate tiered-JIT as a potential variable in the engine when running iterations. (See also Apply AggressiveOptimization to clocks AndreyAkinshin/perfolizer#19)

I tested this on Ryzen 7 9800X3D, and got the same results on both master and this PR. I could not repro the measurement spikes from the issue (the graph was smooth).

I tested on Apple M3 and got these results.

Master:

PR:

Observations to note

The after results are much less spikey for longer.
- I can't say why the last bit of the graph is also spikey, it could have to do with the fact that I ran on a MacBook Air and it could have throttled, but I can't prove it (the benchmark took a very long time to run, so I only ran it once). I also did not include the changes in Apply AggressiveOptimization to clocks AndreyAkinshin/perfolizer#19 in this test, which could also be a factor.
The after results are larger, at the upper end of the spikes from the results with master.
- I suspect that the benchmarks are sensitive to the alignment of the stack when it is invoked. I'm not sure if BDN should take this into account by default, and if so, what it should do about it (we already have [MemoryRandomization] that randomizes the stack for each iteration).

src/BenchmarkDotNet/Engines/Engine.cs

timcassell · 2025-02-13T15:08:47Z

Another curiosity I found while working on #2336 is, unrolling the calls affects performance strangely on ARM architecture (Apple M3) (while it does what you'd expect on x86-64).

Benchmark of just _field++ (M3):

Unroll x16

overhead:  0.911, workload:  0.635, diff: - 0.275

NoUnroll

overhead:  0.900, workload:  1.240, diff:  0.339

Apply AggressiveOptimization to engine methods.

Co-authored-by: Corniel Nobel <[email protected]>

…tion and simpler engine code.

AndreyAkinshin

@timcassell I really love this approach! Maintaining a constant stack size should help achieve more stable measurements. I also like the idea of a unified method for running iterations in the engine instead of delegating it to the *Stage.Run*() methods.

Now, we need to develop a better design. The IEngineStageEvaluator seems unnecessary since we already have IStoppingCriteria; merging these entities would be beneficial. We can also simplify the existing solution by eliminating EngineWarmupStage, EnginePilotStage, and EngineActualStage, as their purpose is to provide running and stopping strategies. Since we have a sealed hierarchy of stages that should not be extended externally, we can introduce an advanced dispatcher that builds an IStoppingCriteria/IEngineStageEvaluator based on IterationStage, IterationMode, and other state properties without calling overridden Engine*Stage methods.

What do you think?

timcassell · 2025-05-01T21:29:07Z

Now, we need to develop a better design. The IEngineStageEvaluator seems unnecessary since we already have IStoppingCriteria; merging these entities would be beneficial.

I initially looked at re-using IStoppingCriteria, but it wouldn't quite work for this purpose due to how the pilot stage builds up the count. I implemented it this way to prevent any breaking changes, since stages and IStoppingCriteria are public. But if we're cool with breaking those public types (or removing them from public surface entirely), then we can certainly adjust the implementation.

We can also simplify the existing solution by eliminating EngineWarmupStage, EnginePilotStage, and EngineActualStage, as their purpose is to provide running and stopping strategies. Since we have a sealed hierarchy of stages that should not be extended externally, we can introduce an advanced dispatcher that builds an IStoppingCriteria/IEngineStageEvaluator based on IterationStage, IterationMode, and other state properties without calling overridden Engine*Stage methods.

I'm not sure what you have in mind there. It seems to me that the EnumerateStages() method is that dispatcher. You just want to remove the stages altogether and move all that logic to that iterator?

AndreyAkinshin · 2025-05-01T22:02:37Z

but it wouldn't quite work for this purpose due to how the pilot stage builds up the count

We can rework how the pilot stage exposes the chosen invocation count; changing the API is not a problem.

But if we're cool with breaking those public types (or removing them from public surface entirely)

Initially, I made it public to enable experiments with stopping criteria without needing to build a new version of BenchmarkDotNet. However, I don't feel that anyone is actually using it. Therefore, I believe it's safe to completely rework the API and reduce the public surface.

I'm not sure what you have in mind there.

Currently, the Engine*Stage classes serve primarily as wrappers for two methods (Auto/Specific) that implement the running and stopping strategy. What if we extracted each strategy into a separate class that manages the stopping behavior and includes additional features, such as exposing the chosen invocation count? EnumerateStages could return instances of this class instead of tuples (IterationStage stage, IterationMode mode, IEngineStageEvaluator evaluator). This change would allow us to eliminate the existing Engine*Stage classes, resulting in a design that is more conducive to testing.

timcassell · 2025-05-01T23:09:33Z

@AndreyAkinshin Since we're ok with breaking changes, what do you think of making the measurement iteration indices 0-based? I noticed they are 1-based while I was working on this, and I couldn't figure out why. Maybe just an off-by-one bug that was never caught? There are no tests for the measurement iteration index.

AndreyAkinshin · 2025-05-02T08:49:19Z

@timcassell The measurement iteration indices are primarily intended for presentation to the user, rather than for internal logic. From a usability perspective, 1-based numbering tends to be more intuitive and user-friendly. For example, we typically refer to the first iteration as “Iteration 1” rather than “Iteration 0”—just as spreadsheet rows in Excel start from 1 to match user expectations.

Using 0-based indices in this context can easily lead to confusion. Consider a case where the output shows:

Iteration 0  
Iteration 1  
Iteration 2

If someone refers to “the second iteration,” it’s ambiguous whether they mean Iteration 1 (which is technically the second) or Iteration 2 (which has the number 2). With 1-based numbering, this ambiguity is avoided.

Since the main purpose of these indices is to improve readability and clarity for the end user, we chose 1-based indexing intentionally.

timcassell · 2025-05-04T00:11:51Z

Initially, I made it public to enable experiments with stopping criteria without needing to build a new version of BenchmarkDotNet. However, I don't feel that anyone is actually using it. Therefore, I believe it's safe to completely rework the API and reduce the public surface.

Well, it looks like there wasn't any way to inject custom stopping criteria. Users would have to implement their own IEngine(Factory) anyway.

timcassell added the Area:Engine label Jan 16, 2025

timcassell requested a review from AndreyAkinshin January 16, 2025 05:30

timcassell force-pushed the engine-stages branch from f62a7a4 to 2c135d2 Compare January 16, 2025 06:14

Corniel reviewed Feb 3, 2025

View reviewed changes

src/BenchmarkDotNet/Engines/Engine.cs Outdated Show resolved Hide resolved

timcassell and others added 3 commits February 15, 2025 15:09

Inline engine stages.

3cb2d96

Apply AggressiveOptimization to engine methods.

Update src/BenchmarkDotNet/Engines/Engine.cs

284644f

Co-authored-by: Corniel Nobel <[email protected]>

Refactor to use IEngineStageEvaluator for constant instruction loca…

091bb56

…tion and simpler engine code.

timcassell force-pushed the engine-stages branch from 4907ffd to 091bb56 Compare February 16, 2025 00:55

AndreyAkinshin added a commit that referenced this pull request May 1, 2025

Bump Perfolizer: 0.4.0->0.4.1 (needed for #2688)

8af544f

AndreyAkinshin requested changes May 1, 2025

View reviewed changes

Refactored according to PR comments.

3846648

timcassell requested a review from AndreyAkinshin May 4, 2025 01:06

Fix measurementsForStatistics.

66260f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant stack size #2688

Constant stack size #2688

timcassell commented Jan 16, 2025 •

edited

Loading

timcassell commented Feb 13, 2025 •

edited

Loading

AndreyAkinshin left a comment

timcassell commented May 1, 2025 •

edited

Loading

AndreyAkinshin commented May 1, 2025

timcassell commented May 1, 2025 •

edited

Loading

AndreyAkinshin commented May 2, 2025

timcassell commented May 4, 2025 •

edited

Loading

Constant stack size #2688

Are you sure you want to change the base?

Constant stack size #2688

Conversation

timcassell commented Jan 16, 2025 • edited Loading

timcassell commented Feb 13, 2025 • edited Loading

AndreyAkinshin left a comment

Choose a reason for hiding this comment

timcassell commented May 1, 2025 • edited Loading

AndreyAkinshin commented May 1, 2025

timcassell commented May 1, 2025 • edited Loading

AndreyAkinshin commented May 2, 2025

timcassell commented May 4, 2025 • edited Loading

timcassell commented Jan 16, 2025 •

edited

Loading

timcassell commented Feb 13, 2025 •

edited

Loading

timcassell commented May 1, 2025 •

edited

Loading

timcassell commented May 1, 2025 •

edited

Loading

timcassell commented May 4, 2025 •

edited

Loading