Propagate min/max/string length statistics to duckdb#7416
Propagate min/max/string length statistics to duckdb#7416
Conversation
5848867 to
c4a56e6
Compare
c4a56e6 to
69e130d
Compare
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.980x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.986x ➖, 0↑ 0↓)
datafusion / parquet (0.974x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.046x ➖, 11↑ 32↓)
duckdb / vortex-compact (1.129x ❌, 10↑ 52↓)
duckdb / parquet (0.991x ➖, 1↑ 1↓)
duckdb / duckdb (0.999x ➖, 1↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.989x ➖, 0↑ 0↓)
datafusion / parquet (0.996x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.250x ❌, 4↑ 25↓)
duckdb / parquet (0.996x ➖, 0↑ 0↓)
duckdb / duckdb (1.025x ➖, 0↑ 3↓)
Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.058x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.985x ➖, 0↑ 0↓)
datafusion / parquet (1.027x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.062x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.064x ➖, 0↑ 1↓)
duckdb / parquet (1.011x ➖, 0↑ 0↓)
Full attributed analysis
|
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
39c09ca to
f79f5b0
Compare
e2e1bbe to
46c5eaf
Compare
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
File Sizes: Clickbench on NVMENo baseline file sizes found for base commit. |
Benchmarks: Random AccessVortex (geomean): 0.816x ✅ unknown / unknown (0.907x ➖, 11↑ 1↓)
|
Benchmarks: CompressionVortex (geomean): 0.999x ➖ unknown / unknown (0.994x ➖, 0↑ 0↓)
|
46c5eaf to
276db60
Compare
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
|
Looks like some real regressions here. Opening all the files AoT is not a good idea, its very expensive. |
gatesn
left a comment
There was a problem hiding this comment.
I think the regression might be eagerly merging all stats for all columns, when we only need min/max/str_len for columns that DuckDB asks us about
| column_index: usize, | ||
| ) -> &'a ColumnStatistics { | ||
| match &bind_data.stats { | ||
| DataSourceStatistics::All(items) => &items[column_index], |
There was a problem hiding this comment.
We should lazily construct stats for each column. Could be way fewer columns that DuckDB asks for
| self.statistics.as_ref() | ||
| } | ||
|
|
||
| pub fn take_statistics(&mut self) -> Option<FileStatistics> { |
| self.footer.statistics() | ||
| } | ||
|
|
||
| pub fn take_file_stats(&mut self) -> Option<FileStatistics> { |
Signed-off-by: Mikhail Kot <to@myrrc.dev>
Uh oh!
There was an error while loading. Please reload this page.