[C++][Benchmarks] Experiment with CMAKE_INTERPROCEDURAL_OPTIMIZATION #47637

pitrou · 2025-09-24T10:09:40Z

Thanks for opening a pull request!

If this is your first pull request you can find detailed information on how to contribute here:

Please remove this line and the above text before creating your pull request.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

pitrou · 2025-09-24T10:09:55Z

@ursabot please benchmark

voltrondatabot · 2025-09-24T10:10:03Z

Benchmark runs are scheduled for commit 41f441a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

pitrou · 2025-09-24T10:10:35Z

@AntoinePrv This is trying to run the benchmarks with IPO to see if it makes a significant difference.

pitrou · 2025-09-24T12:42:15Z

Looks like we miss ar and ranlib on macOS. https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/7320/steps/canvas?sid=01997b83-cbd5-4f86-8d2c-dfe8315e630a#01997b83-cc32-46ad-b680-38fbc1e59ab1/34-1341

pitrou · 2025-09-24T13:42:10Z

@ursabot please benchmark lang=C++

voltrondatabot · 2025-09-24T13:42:16Z

Benchmark runs are scheduled for commit bf2698a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

pitrou · 2025-09-24T14:25:24Z

Note: a caveat of the conbench benchmarks results is that they all use the conda-forge toolchain, including the same compiler and binutils versions (gcc 14.3.0).

pitrou · 2025-09-24T14:38:26Z

I'm trying this locally, also using conda-forge and therefore the same toolchain. One interesting aspect is that IPO seems to decrease total library size, perhaps by removing unused private functions and data:

before:

$ size -G /build/build-release/relwithdebinfo/*.so
      text       data        bss      total filename
   1231288     436476       1080    1668844 /build/build-release/relwithdebinfo/libarrow_acero.so
  11719000    2076782      46232   13842014 /build/build-release/relwithdebinfo/libarrow_compute.so
   1029085     613682       3496    1646263 /build/build-release/relwithdebinfo/libarrow_dataset.so
  11368776    3895932    2181849   17446557 /build/build-release/relwithdebinfo/libarrow.so
   1010359     500136       1992    1512487 /build/build-release/relwithdebinfo/libarrow_testing.so
   2675915    1510609       6328    4192852 /build/build-release/relwithdebinfo/libparquet.so

after:

$ size -G /build/build-release/relwithdebinfo/*.so
      text       data        bss      total filename
    971243     380629       1552    1353424 /build/build-release/relwithdebinfo/libarrow_acero.so
  11764472    1924230      46440   13735142 /build/build-release/relwithdebinfo/libarrow_compute.so
    911885     500747       4328    1416960 /build/build-release/relwithdebinfo/libarrow_dataset.so
  10306362    3473709    2183392   15963463 /build/build-release/relwithdebinfo/libarrow.so
    954794     439139       3968    1397901 /build/build-release/relwithdebinfo/libarrow_testing.so
   2446829    1344558       4128    3795515 /build/build-release/relwithdebinfo/libparquet.so

pitrou · 2025-09-24T15:10:11Z

I ran two sets of benchmark locally and the results are roughly similar to those on the conbench machines:

Parquet benchmarks: https://gist.github.com/pitrou/b20fae69d72933dd82f066f64af54e99
Compute benchmarks: https://gist.github.com/pitrou/132a23457713775697253ca8d54de408

WillAyd

Very cool. Even if the runtime benchmarks are the same, this seems pretty easy to implement and results in good library size savings

WillAyd · 2025-09-24T15:39:00Z

I see that LLVM also has a "thin" LTO type that may offer improvements over the default LTO

https://clang.llvm.org/docs/ThinLTO.html
https://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html

Do we know if this enabled by CMake through this option, or would it be an additional option to benchmark?

pitrou · 2025-09-24T15:41:36Z

I have no idea, but the benchmarks here use gcc.

pitrou · 2025-09-24T15:43:38Z

Also, I was surprised that, at least with gcc 14.3.0, build times do not seem to increase significantly with CMAKE_INTERPROCEDURAL_OPTIMIZATION enabled. So perhaps gcc is already using some equivalent of clang's "thin LTO".

conbench-apache-arrow · 2025-09-24T16:52:00Z

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit 41f441a.

There were 31 benchmark results indicating a performance regression:

Pull Request Run on amd64-c6a-4xlarge-linux at 2025-09-24 11:15:34Z
- MaxElementWiseArrayScalarString (C++) with params=32768/2, source=cpp-micro, suite=arrow-compute-scalar-compare-benchmark
- SetBitRunReader (C++) with params=50, source=cpp-micro, suite=arrow-bit-util-benchmark
and 29 more (see the report linked below)

The full Conbench report has more details.

conbench-apache-arrow · 2025-09-24T21:21:04Z

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit bf2698a.

There were 33 benchmark results indicating a performance regression:

Pull Request Run on arm64-t4g-2xlarge-linux at 2025-09-24 17:58:41Z
- CopyEmptyVector (C++) with params=<STATIC_VECTOR(std::string)>, source=cpp-micro, suite=arrow-small-vector-benchmark
- CopyShortVector (C++) with params=<STD_VECTOR(int)>, source=cpp-micro, suite=arrow-small-vector-benchmark
and 31 more (see the report linked below)

The full Conbench report has more details.

github-actions bot added the awaiting review Awaiting review label Sep 24, 2025

pitrou added 2 commits September 24, 2025 15:15

[C++][Benchmarks] Experiment with CMAKE_INTERPROCEDURAL_OPTIMIZATION

0bb8bb1

Add binutils

bf2698a

pitrou force-pushed the exp_ipo branch from 41f441a to bf2698a Compare September 24, 2025 13:17

This was referenced Sep 24, 2025

[Python] Enable CMAKE_INTERPROCEDURAL_OPTIMIZATION for wheel builds #47643

Open

Enable CMAKE_INTERPROCEDURAL_OPTIMIZATION conda-forge/arrow-cpp-feedstock#1865

Closed

[C++][Packaging] Enable CMAKE_INTERPROCEDURAL_OPTIMIZATION for binary packages #47644

Open

WillAyd approved these changes Sep 24, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++][Benchmarks] Experiment with CMAKE_INTERPROCEDURAL_OPTIMIZATION #47637

[C++][Benchmarks] Experiment with CMAKE_INTERPROCEDURAL_OPTIMIZATION #47637

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

voltrondatabot commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

voltrondatabot commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025 •

edited

Loading

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

WillAyd left a comment

Uh oh!

WillAyd commented Sep 24, 2025 •

edited

Loading

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

conbench-apache-arrow bot commented Sep 24, 2025

Uh oh!

conbench-apache-arrow bot commented Sep 24, 2025

Uh oh!

Uh oh!

[C++][Benchmarks] Experiment with CMAKE_INTERPROCEDURAL_OPTIMIZATION #47637

Are you sure you want to change the base?

[C++][Benchmarks] Experiment with CMAKE_INTERPROCEDURAL_OPTIMIZATION #47637

Uh oh!

Conversation

pitrou commented Sep 24, 2025

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

voltrondatabot commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

voltrondatabot commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

pitrou commented Sep 24, 2025

Uh oh!

conbench-apache-arrow bot commented Sep 24, 2025

Uh oh!

conbench-apache-arrow bot commented Sep 24, 2025

Uh oh!

Uh oh!

pitrou commented Sep 24, 2025 •

edited

Loading

WillAyd commented Sep 24, 2025 •

edited

Loading