[fix](iterator) Use explicit output schema in new_merge_iterator and new_union_iterator#60772
[fix](iterator) Use explicit output schema in new_merge_iterator and new_union_iterator#60772uchenily wants to merge 7 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 28891 ms |
TPC-DS: Total hot run time: 184014 ms |
|
run beut |
| // delete_hanlder is always set, but it maybe not init, so that it will return empty conditions | ||
| // or predicates when it is not inited. | ||
| if (_read_context->delete_handler != nullptr) { | ||
| _read_context->delete_handler->get_delete_conditions_after_version( |
There was a problem hiding this comment.
把你github pr description 那个例子,弄成regression test 加入到PR 里
There was a problem hiding this comment.
有什么更直接的办法创建overlapping状态的rowset吗, 我复现的步骤里面需要改 be.conf 添加 write_buffer_size = 8 这个没法放到回归测试
There was a problem hiding this comment.
可以借助docker case,参考regression-test目录下带docker tag的suites来构造测试case。
例如:suite("test_cloud_calc_sync_version","docker")
也可以通过DebugPoint注入的方式让memtable提前下刷,构造若干non overlap的小segment。
例如:https://github.com/apache/doris/blob/afce69a6d52d9254bb98fcf8fce1135a4bb998c0/regression-test/suites/schema_change_p0/test_non_overlap_seg_heavy_sc.groovy
11bf214 to
ccc8840
Compare
|
run buildall |
TPC-H: Total hot run time: 28633 ms |
TPC-DS: Total hot run time: 184048 ms |
|
run beut |
|
run nonConcurrent |
|
run p0 |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run nonConcurrent |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
5c21ca8 to
3ef93a2
Compare
|
run buildall |
1 similar comment
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 28647 ms |
TPC-DS: Total hot run time: 183103 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
LGTM, a regression test would be nice |
This PR ensures merge/union iterators use an explicit output schema projection and copy only the requested columns, preventing column count mismatches when delete-predicate columns are read in addition to return columns.
BetaRowsetReader now builds an
output_schemafrom return_columns and passes it to merge/union iterators, VMergeIteratorContext copies using the output schema (not the incorrect _iter->schema())Consider the following table:
And a delete predicate applied to a non-key column:
When executing ORDER BY k LIMIT n, Doris has a Top-N optimization. Even though the query is SELECT *, the engine initially avoids scanning all columns. It constructs a minimal intermediate schema containing only the sort keys (k) and the internal
__DORIS_ROWID_COL__to perform the merge and sorting efficiently. (_col_ids = {0, 3}, ==> _num_columns = 2). However, because a delete predicate exists on column v1, the BetaRowsetReader add v1 to this intermediate schema to evaluate and filter out deleted rows during the scan. (_col_ids = {0, 3, 1}, note that column v1 (index=1) is appended to this schema ==> _num_columns = 3)The previous implementation of VMergeIteratorContext::copy_rows used the incorrect _num_columns value, resulting in an array out-of-bounds access and causing BE coredumped.
Detailed reproduction steps are follows:
Co-authored-by: yiguolei guolei@selectdb.com
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)