[fix](profile) restore InvertedIndexQueryTime sub-timer invariants in SEARCH path by airborne12 · Pull Request #62301 · apache/doris

airborne12 · 2026-04-09T14:36:18Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

The SEARCH path (FunctionSearch::evaluate_inverted_index_with_search_param) currently lays out its profile sub-timers in a way that breaks two invariants that already hold in the MATCH path (FullTextIndexReader::query):

(a) InvertedIndexQueryTime        ≈ InvertedIndexSearcherOpenTime
                                  + InvertedIndexSearcherSearchTime
(b) InvertedIndexSearcherSearchTime ≈ InvertedIndexSearcherSearchInitTime
                                    + InvertedIndexSearcherSearchExecTime

Observed on a production SEARCH query scanning 237 segments (cumulative):

InvertedIndexFilterTime:            1min1sec
InvertedIndexQueryTime:             8sec617ms
InvertedIndexSearcherOpenTime:      5sec698ms
InvertedIndexSearcherSearchTime:    8sec612ms
InvertedIndexSearcherSearchInitTime:5sec704ms
InvertedIndexSearcherSearchExecTime:62.42us

(a) fails: SearchTime + OpenTime = 14310ms, far exceeds QueryTime = 8617ms. Root cause: OpenTime is scoped inside FieldReaderResolver::resolve, which is called from build_query_recursive, which was itself wrapped by SearchInitTime → SearchTime. So OpenTime was nested inside SearchTime, not a sibling, and adding them as siblings double-counts the open phase.
(b) fails: SearchTime − SearchInitTime − SearchExecTime ≈ 2.9 s of completely untimed work. Root cause: weight->scorer(...) — which is where per-term posting-list iterators are actually opened — lived between the old SearchInitTime (only wrapping build_query_recursive) and SearchExecTime (only wrapping the result-collection loop), so it was not attributed to any sub-timer.

This PR restructures the execution path in evaluate_inverted_index_with_search_param to mirror the MATCH path semantics: QueryTime wraps everything; inside QueryTime, OpenTime and SearchTime are siblings; inside SearchTime, SearchInitTime and SearchExecTime are siblings.

Changes in be/src/exprs/function/function_search.cpp:

Move build_query_recursive(...) outside the search_timer scope. It still opens searchers via FieldReaderResolver::resolve, so OpenTime accumulates as before — it is simply no longer nested inside SearchTime.
Redefine SearchInitTime to wrap Query::weight(...) — the SEARCH analog of query->add(query_info) in MATCH: preparing the executable query structure.
Extend SearchExecTime to wrap the collect_multi_segment_* iteration (where weight->scorer(...) opens posting-list iterators) plus the lazy null-bitmap fetch that follows.
Move InvertedIndexResultBitmap construction, mask_out_null(), and the DSL cache insert outside SearchTime — still inside QueryTime — mirroring how FullTextIndexReader::query does cache->insert(...) only after match_index_search returns.

Complementary diagnostic in SegmentIterator::_get_row_ranges_by_column_conditions:

InvertedIndexFilterTime now has three sibling child timers so that FilterTime ≈ ApplyColPred + ApplyExpr + PostFilter holds by construction, making the FilterTime ↔ QueryTime gap always explainable from the profile alone:

InvertedIndexApplyColPredTime — wraps _apply_inverted_index()
InvertedIndexApplyExprTime — wraps _apply_index_expr() (contains InvertedIndexQueryTime)
InvertedIndexPostFilterTime — wraps post-filter bitmap &= + per-column _check_all_conditions_passed_inverted_index_for_column

Also in be/src/exprs/vsearch.cpp (VSearchExpr::evaluate_inverted_index):

Demote the per-segment LOG(INFO) / LOG(WARNING) on benign early-exit paths to VLOG_DEBUG. At production scan rates these add measurable cumulative wall-clock to FilterTime while contributing nothing useful at INFO level.
Pass the result bitmap to set_index_result_for_expr via std::move to avoid a per-segment deep copy of the underlying roaring bitmap.

Verification:

After the fix, small-test profile now shows:

QueryTime:             603.740us
  OpenTime:              297.914us   \_ sum = 452.68us ≈ QueryTime (drift ~fixed overhead)
  SearchTime:            154.771us   /
    SearchInitTime:       74.950us   \_ sum = 151.19us ≈ SearchTime (2.3% drift)
    SearchExecTime:       76.236us   /

Projected on the production profile: QueryTime ≈ OpenTime (5.7s) + SearchTime (2.9s) ≈ 8.6s, invariant (a) restored; SearchTime ≈ SearchInitTime (2.9s) + SearchExecTime (~4ms), invariant (b) restored; the 2.9 s of previously-untimed scorer construction is now attributed to SearchInitTime.

Release note

Fix InvertedIndexSearcher*Time profile sub-timer hierarchy in the search() function path so that QueryTime ≈ OpenTime + SearchTime and SearchTime ≈ SearchInitTime + SearchExecTime, matching MATCH-path semantics. Also add three sub-timers (InvertedIndexApplyColPredTime, InvertedIndexApplyExprTime, InvertedIndexPostFilterTime) under InvertedIndexFilterTime so that the filter block's wall time is fully attributed.

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Unit test added: be/test/storage/segment/segment_iterator_filter_sub_timer_test.cpp — pins both invariants by running SCOPED_RAW_TIMER on the OlapReaderStatistics fields in the same nesting shape as the production code and asserting parent ≥ sum(children) with a sub-ms drift bound. Any future regression that re-nests OpenTime inside SearchTime or drops a sub-timer around the scorer-construction phase will flip these assertions.

Manual test: rebuilt BE, deployed to a local cloud-sim cluster, ran SELECT id FROM t_search_metrics WHERE search('content:doris') ORDER BY id with enable_profile=true profile_level=2, verified the new counters appear and that QueryTime ≈ OpenTime + SearchTime and SearchTime ≈ SearchInitTime + SearchExecTime.
Behavior changed:
- No.
- Yes.
This PR only changes profile-timer scope placement and two non-functional micro-optimizations (LOG → VLOG_DEBUG, one std::move). The query result is unchanged.
Does this need documentation?
- No.
- Yes.

… SEARCH path The SEARCH path (`FunctionSearch::evaluate_inverted_index_with_search_param`) recently grew `inverted_index_searcher_{open,search,search_init,search_exec}_timer` instrumentation, but with a scope layout that breaks two profile invariants that already hold in the MATCH path: (a) InvertedIndexQueryTime ≈ InvertedIndexSearcherOpenTime + InvertedIndexSearcherSearchTime (b) InvertedIndexSearcherSearchTime ≈ InvertedIndexSearcherSearchInitTime + InvertedIndexSearcherSearchExecTime Observed on a production SEARCH query scanning 237 segments: InvertedIndexFilterTime: 1min1sec InvertedIndexQueryTime: 8sec617ms InvertedIndexSearcherOpenTime: 5sec698ms InvertedIndexSearcherSearchTime:8sec612ms InvertedIndexSearcherSearchInitTime: 5sec704ms InvertedIndexSearcherSearchExecTime: 62.42us (a) fails: SearchTime + OpenTime = 14310ms, far exceeds QueryTime 8617ms (because OpenTime is nested inside SearchInitTime inside SearchTime, so adding them as siblings double-counts the open phase). (b) fails: SearchTime - SearchInitTime - SearchExecTime = 2908ms of completely untimed work (the `weight->scorer(...)` call, which is where the per-term posting-list iterators are actually opened, was outside any sub-timer). Fix: restructure the execution path in `evaluate_inverted_index_with_search_param` to mirror MATCH path semantics, i.e. `FullTextIndexReader::query`: QueryTime wraps everything; inside QueryTime, OpenTime and SearchTime are siblings; inside SearchTime, SearchInitTime and SearchExecTime are siblings. - Move `build_query_recursive(...)` OUTSIDE the `search_timer` scope. The call still opens searchers via `FieldReaderResolver::resolve`, so OpenTime accumulates as before — it's just no longer nested inside SearchTime. - Define SearchInitTime to wrap `Query::weight(...)` (preparing the executable query structure — the SEARCH analog of `query->add(query_info)` in MATCH). - Define SearchExecTime to wrap the `collect_multi_segment_*` iteration (which is where `weight->scorer(...)` opens posting-list iterators) plus the lazy null-bitmap fetch that follows. - Move `InvertedIndexResultBitmap` construction, `mask_out_null()`, and the DSL cache insert OUTSIDE SearchTime — still inside QueryTime — matching how `FullTextIndexReader::query` does `cache->insert(...)` only after `match_index_search` returns. As a complementary diagnostic, `InvertedIndexFilterTime` now has three sibling child timers in `SegmentIterator::_get_row_ranges_by_column_conditions`: InvertedIndexApplyColPredTime — `_apply_inverted_index()` InvertedIndexApplyExprTime — `_apply_index_expr()` (contains InvertedIndexQueryTime) InvertedIndexPostFilterTime — post-filter bitmap `&=` + per-column `_check_all_conditions_passed_...` so that FilterTime ≈ ApplyColPred + ApplyExpr + PostFilter holds by construction, making the FilterTime ↔ QueryTime gap always explainable from the profile alone. Also in `VSearchExpr::evaluate_inverted_index`: - Demote the per-segment `LOG(INFO)` / `LOG(WARNING)` on benign early-exit paths to `VLOG_DEBUG`. At production scan rates these add measurable cumulative wall-clock to FilterTime while contributing nothing useful at INFO level. - Pass the result bitmap to `set_index_result_for_expr` via `std::move` to avoid a per-segment deep copy of the underlying roaring bitmap. New unit test `segment_iterator_filter_sub_timer_test.cpp` pins both invariants by running `SCOPED_RAW_TIMER` on the `OlapReaderStatistics` fields in the same nesting shape as the production code and asserting `parent ≥ sum(children)` with a sub-ms drift bound. Any future regression that re-nests OpenTime inside SearchTime or drops a sub-timer around the scorer-construction phase will flip these assertions.

hello-stephen · 2026-04-09T14:36:25Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

airborne12 · 2026-04-09T14:36:31Z

run buildall

hello-stephen · 2026-04-09T18:35:06Z

BE Regression && UT Coverage Report

Increment line coverage 86.60% (84/97) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.87% (26725/37186)
Line Coverage	54.86% (283131/516082)
Region Coverage	51.79% (234008/451828)
Branch Coverage	53.28% (101266/190049)

…uption via op_not aliasing The previous commit included a "micro-optimization" in `VSearchExpr::evaluate_inverted_index`: index_context->set_index_result_for_expr(this, std::move(result_bitmap)); intending to skip a per-segment deep copy of the underlying roaring bitmap. This is wrong: at the point of that call, the DSL result cache (`InvertedIndexQueryCache`) already holds a `shared_ptr` to the same underlying `roaring::Roaring` as `result_bitmap._data_bitmap` (`function_search.cpp::evaluate_inverted_index_with_search_param` inserts into the cache *before* the function returns). The previous lvalue `set_index_result_for_expr(this, result_bitmap)` worked precisely because the parameter is `InvertedIndexResultBitmap` by value, so its copy constructor deep-copies the Roaring and isolates the per-segment expr- context map entry from the cache. With `std::move`, the per-segment map entry instead aliases the cached Roaring. `VCompoundPred::COMPOUND_NOT` then calls `InvertedIndexResultBitmap::op_not()`, which is declared `const` but mutates `*_data_bitmap` in place (the const-ness only applies to the shared_ptr member, not the pointee). That mutation corrupts the DSL cache entry, so when the *next* segment hits the cache it observes the *negated* roaring instead of the original SEARCH match. The same hazard exists for `op_and` / `op_or` (both also mutate through the shared_ptr). Symptoms — apache#62301 CI, both `Doris_DorisRegression_P0Regression` and `Doris_DorisRegression_CloudP0`, 4 new test failures for the queries that go through compound NOT/AND on top of `search()`: - search/test_search_null_regression (line 115: `WHERE not search('content:Round', '{"mode":"standard"}')`) - search/test_search_dsl_syntax (line 305: NOT search + IS NOT NULL) - search/test_search_vs_match_consistency (line 200: AND with NULL) - search/test_search_not_null_bitmap (line 79: NOT search + count agg) For each, the test got rows that the SEARCH side clearly matches (e.g. row 3 of `search_null_regression_test`, whose `content` is "A story about Ronald McDonald at the Round table"), proving the cached search result had been negated by an earlier compound NOT. Reverting to the lvalue form restores deep-copy isolation between the DSL cache and the per-segment expr-context bitmap. Verified on a local cluster: after reverting, `WHERE not search('content:Round', ...)` returns the expected rows {1, 2, 5, 11, 13} and stays stable across repeated invocations (previously it drifted as the cache got corrupted on segment-2+). Note: this `op_not`/`op_and`/`op_or` mutation pattern through a `const` shared_ptr accessor is itself fragile. A separate, larger refactor to make those operators non-const (or to deep-copy in the operator) would be the right long-term fix; this patch is the minimal revert that restores correctness without changing the operator API.

airborne12 · 2026-04-10T01:18:29Z

run buildall

The previous CI run flagged 4 "new" failures in Doris_DorisRegression_P0Regression and Doris_DorisRegression_CloudP0, all in search tests that exercise compound NOT/AND on top of search():

search/test_search_null_regression (line 115)
search/test_search_dsl_syntax (line 305)
search/test_search_vs_match_consistency (line 200)
search/test_search_not_null_bitmap (line 79)

Root cause: the previous commit included a std::move(result_bitmap) micro-optimization in VSearchExpr::evaluate_inverted_index. At that point the DSL result cache (InvertedIndexQueryCache) already holds a shared_ptr to the same underlying roaring::Roaring as result_bitmap._data_bitmap. Moving (instead of lvalue-copying) into set_index_result_for_expr makes the per-segment expr-context map entry alias the cached Roaring, and VCompoundPred::COMPOUND_NOT then calls InvertedIndexResultBitmap::op_not() — which is declared const but writes through the shared_ptr (*_data_bitmap = *universe - *_data_bitmap - *_null_bitmap). That mutation corrupts the cached bitmap, so the next segment that hits the cache observes the negated SEARCH result.

Fixed in b53104b: revert to the lvalue form, so the value-by-copy parameter of set_index_result_for_expr deep-copies the Roaring and isolates the cache from per-segment mutations.

Verified locally: WHERE not search('content:Round', '{"mode":"standard"}') returns the expected rows {1, 2, 5, 11, 13} and stays stable across 5 repeated invocations (previously drifted as the cache got corrupted on segment-2+).

hello-stephen · 2026-04-10T02:46:40Z

BE UT Coverage Report

Increment line coverage 33.33% (32/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.03% (20136/37971)
Line Coverage	36.57% (189296/517664)
Region Coverage	32.82% (146947/447695)
Branch Coverage	33.94% (64309/189467)

hello-stephen · 2026-04-10T04:11:27Z

BE Regression && UT Coverage Report

Increment line coverage 45.83% (44/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	57.52% (21388/37186)
Line Coverage	40.55% (209248/516082)
Region Coverage	36.76% (166098/451828)
Branch Coverage	37.64% (71538/190049)

hello-stephen · 2026-04-10T04:38:08Z

BE Regression && UT Coverage Report

Increment line coverage 45.83% (44/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	57.52% (21388/37186)
Line Coverage	40.55% (209248/516082)
Region Coverage	36.76% (166098/451828)
Branch Coverage	37.64% (71538/190049)

hello-stephen · 2026-04-10T04:44:31Z

BE Regression && UT Coverage Report

Increment line coverage 45.83% (44/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	57.78% (21487/37186)
Line Coverage	40.86% (210849/516082)
Region Coverage	37.20% (168069/451828)
Branch Coverage	38.05% (72305/190049)

hello-stephen · 2026-04-10T05:02:26Z

BE Regression && UT Coverage Report

Increment line coverage 64.58% (62/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	63.71% (23693/37186)
Line Coverage	47.23% (243739/516082)
Region Coverage	44.10% (199265/451828)
Branch Coverage	45.44% (86362/190049)

airborne12 requested review from BiteTheDDDDt and zclllyybb as code owners April 9, 2026 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](profile) restore InvertedIndexQueryTime sub-timer invariants in SEARCH path#62301

[fix](profile) restore InvertedIndexQueryTime sub-timer invariants in SEARCH path#62301
airborne12 wants to merge 2 commits intoapache:masterfrom
airborne12:search-profile-fix-timer-hierarchy

airborne12 commented Apr 9, 2026

Uh oh!

hello-stephen commented Apr 9, 2026

Uh oh!

airborne12 commented Apr 9, 2026

Uh oh!

hello-stephen commented Apr 9, 2026

Uh oh!

airborne12 commented Apr 10, 2026

Uh oh!

hello-stephen commented Apr 10, 2026

Uh oh!

hello-stephen commented Apr 10, 2026

Uh oh!

hello-stephen commented Apr 10, 2026

Uh oh!

hello-stephen commented Apr 10, 2026

Uh oh!

hello-stephen commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

airborne12 commented Apr 9, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Uh oh!

hello-stephen commented Apr 9, 2026

Uh oh!

airborne12 commented Apr 9, 2026

Uh oh!

hello-stephen commented Apr 9, 2026

BE Regression && UT Coverage Report

Uh oh!

airborne12 commented Apr 10, 2026

Uh oh!

hello-stephen commented Apr 10, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 10, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 10, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 10, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 10, 2026

BE Regression && UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants