exp: add --pin-threads option to dfbench for CPU affinity; thread-local IO#20912
exp: add --pin-threads option to dfbench for CPU affinity; thread-local IO#20912Dandandan wants to merge 4 commits intoapache:mainfrom
Conversation
Pin each tokio worker thread to a distinct CPU core for more stable and reproducible benchmark results. Enabled via PIN_THREADS=true in bench.sh or --pin-threads flag directly on dfbench. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
So it seems to have some effect on But I guess it matters more when we also have control over the IO threads, which perhaps fits good in the morsel work. |
|
run benchmarks |
|
Hi @adriangb, your benchmark configuration could not be parsed (#20912 (comment)). Error: Supported benchmarks:
Usage: Per-side configuration ( env:
SHARED_SETTING: enabled
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G |
|
🤖 |
|
run benchmarks env:
PIN_THREADS: true |
|
Hi @adriangb, your benchmark configuration could not be parsed (#20912 (comment)). Error: Supported benchmarks:
Usage: Per-side configuration ( env:
SHARED_SETTING: enabled
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G |
|
Lol 2 bots competing!! |
|
run benchmarks env:
PIN_THREADS: true |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmarks |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
run benchmarks env:
PIN_THREADS: true |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
…O thread for large Small reads (<1MB) use block_in_place for L1/L2 cache locality with zero coordination overhead. Large reads (>=1MB) dispatch to the per-core IO thread to keep the tokio worker free, since the data won't fit in cache anyway. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run benchmarks env:
PIN_THREADS: true |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|

Pin each tokio worker thread to a distinct CPU core to allow for improved. Enabled via PIN_THREADS=true in bench.sh or --pin-threads flag directly on dfbench.
Ideally we should do it both for IO (e.g. those created by
spawn_blocking) threads to make sure more data from IO reads are in CPU cache once we start reading (but that would be a future step).Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?