[bench] Remove file prefetching from FileStream#20916
[bench] Remove file prefetching from FileStream#20916Dandandan wants to merge 4 commits intoapache:mainfrom
Conversation
Simplify the FileStream state machine by removing the mechanism that opens the next file in parallel while scanning the current one. Files are now opened sequentially (Scan -> Idle -> Open) instead of prefetching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run benchmarks |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
This reverts commit 38fe60a.
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
| .metrics | ||
| .clone(); | ||
| let _timer = scanning_total_metric.timer(); | ||
| self.start_next_file().transpose() |
There was a problem hiding this comment.
So this @alamb is what I was mostly talking about. It will both read the footer (what we want) but AFAIK also:
- build the pruning predicate (I think this is suboptimal, too early)
- prune row groups
- optionally load the page index
- return the stream (without driving that forward)
We should be able to do this much better with the IO / CPU separation.
There was a problem hiding this comment.
This makes sense to me 👍🏻
|
🤖 |
|
🤖: Benchmark completed Details
|
Simplify the FileStream state machine by removing the mechanism that opens the next file in parallel while scanning the current one. Files are now opened sequentially (Scan -> Idle -> Open) instead of prefetching.
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?