Taming variance of UnixBench results when comparing systems

UnixBench has been shown to be very sensitive to compiler versions [1], compiles with out-dated [2] performance compiler optimizations, and includes hacks to avoid dead-code elimination by the compiler [3]. It seems one of the original intentions of the UnixBench was to have the ability to measure compiler performance[4], as a measurement of the overall system performance, reduced into a single metric[5].

Today, UnixBench it still being used to compare performance between systems, but most people have forgotten to read the warnings, and caveats included in both in the README.md and USAGE files included with this benchmark, which even the authors warned about the pitfalls when interpreting the results  of different systems[6]. Even more worrying is how this benchmark is promoted[7] as 'the' single metric to look at when comparing different systems, even when these systems use different OSes, compiler, virtualization technologies, and even different architectures.

A lot of these problems seems to stem from the variability introduced by the compiler, and different versions of the linked libraries used by the benchmark. 

Since UnixBench, Today, is mostly used as a benchmark to compare the performance across different infrastructure providers, I propose as a way to reduce the variability introduced by the factors mentioned above, to move UnixBench included benchmarks to be statically, binary-reproducible compiled binaries, which can be verified by means of a hash, for each major architecture out there.

The benefits of statically, binary-reproducible binaries would mean that compiler effects would be minimized, since binaries are distributed pre-compiled, it would also mean that because it is statically compiled, different versions of the dynamically linked libraries would not introduce variability. Finally because it is binary-reproducible, we can cross-verify, and compare results of the identical copies of the benchmark being executed, by means of hashing the binaries.

What are your thoughts? Do you think this a bad idea?

Thanks,
Carlos

[1] [Compilers Love Messing With Benchmarks](http://www.brendangregg.com/blog/2014-05-02/compilers-love-messing-with-benchmarks.html)
[2] Issue #17
[3] Issue #10 
[4] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/USAGE#L351-L358
[5] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/USAGE#L174
[6] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/USAGE#L348-L349
[7] http://serverbear.com/benchmarks/vps


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Taming variance of UnixBench results when comparing systems #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Taming variance of UnixBench results when comparing systems #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions