Skip to content

Taming variance of UnixBench results when comparing systems #23

Open
@meteorfox

Description

@meteorfox

UnixBench has been shown to be very sensitive to compiler versions [1], compiles with out-dated [2] performance compiler optimizations, and includes hacks to avoid dead-code elimination by the compiler [3]. It seems one of the original intentions of the UnixBench was to have the ability to measure compiler performance[4], as a measurement of the overall system performance, reduced into a single metric[5].

Today, UnixBench it still being used to compare performance between systems, but most people have forgotten to read the warnings, and caveats included in both in the README.md and USAGE files included with this benchmark, which even the authors warned about the pitfalls when interpreting the results of different systems[6]. Even more worrying is how this benchmark is promoted[7] as 'the' single metric to look at when comparing different systems, even when these systems use different OSes, compiler, virtualization technologies, and even different architectures.

A lot of these problems seems to stem from the variability introduced by the compiler, and different versions of the linked libraries used by the benchmark.

Since UnixBench, Today, is mostly used as a benchmark to compare the performance across different infrastructure providers, I propose as a way to reduce the variability introduced by the factors mentioned above, to move UnixBench included benchmarks to be statically, binary-reproducible compiled binaries, which can be verified by means of a hash, for each major architecture out there.

The benefits of statically, binary-reproducible binaries would mean that compiler effects would be minimized, since binaries are distributed pre-compiled, it would also mean that because it is statically compiled, different versions of the dynamically linked libraries would not introduce variability. Finally because it is binary-reproducible, we can cross-verify, and compare results of the identical copies of the benchmark being executed, by means of hashing the binaries.

What are your thoughts? Do you think this a bad idea?

Thanks,
Carlos

[1] Compilers Love Messing With Benchmarks
[2] Issue #17
[3] Issue #10
[4] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/USAGE#L351-L358
[5] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/USAGE#L174
[6] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/USAGE#L348-L349
[7] http://serverbear.com/benchmarks/vps

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions