[fix](profile) restore InvertedIndexQueryTime sub-timer invariants in SEARCH path#62301
[fix](profile) restore InvertedIndexQueryTime sub-timer invariants in SEARCH path#62301airborne12 wants to merge 2 commits intoapache:masterfrom
Conversation
… SEARCH path
The SEARCH path (`FunctionSearch::evaluate_inverted_index_with_search_param`)
recently grew `inverted_index_searcher_{open,search,search_init,search_exec}_timer`
instrumentation, but with a scope layout that breaks two profile invariants
that already hold in the MATCH path:
(a) InvertedIndexQueryTime ≈ InvertedIndexSearcherOpenTime
+ InvertedIndexSearcherSearchTime
(b) InvertedIndexSearcherSearchTime ≈ InvertedIndexSearcherSearchInitTime
+ InvertedIndexSearcherSearchExecTime
Observed on a production SEARCH query scanning 237 segments:
InvertedIndexFilterTime: 1min1sec
InvertedIndexQueryTime: 8sec617ms
InvertedIndexSearcherOpenTime: 5sec698ms
InvertedIndexSearcherSearchTime:8sec612ms
InvertedIndexSearcherSearchInitTime: 5sec704ms
InvertedIndexSearcherSearchExecTime: 62.42us
(a) fails: SearchTime + OpenTime = 14310ms, far exceeds QueryTime 8617ms
(because OpenTime is nested inside SearchInitTime inside SearchTime,
so adding them as siblings double-counts the open phase).
(b) fails: SearchTime - SearchInitTime - SearchExecTime = 2908ms of
completely untimed work (the `weight->scorer(...)` call, which is
where the per-term posting-list iterators are actually opened, was
outside any sub-timer).
Fix: restructure the execution path in
`evaluate_inverted_index_with_search_param` to mirror MATCH path semantics,
i.e. `FullTextIndexReader::query`:
QueryTime wraps everything; inside QueryTime, OpenTime and SearchTime are
siblings; inside SearchTime, SearchInitTime and SearchExecTime are
siblings.
- Move `build_query_recursive(...)` OUTSIDE the `search_timer` scope. The
call still opens searchers via `FieldReaderResolver::resolve`, so OpenTime
accumulates as before — it's just no longer nested inside SearchTime.
- Define SearchInitTime to wrap `Query::weight(...)` (preparing the
executable query structure — the SEARCH analog of `query->add(query_info)`
in MATCH).
- Define SearchExecTime to wrap the `collect_multi_segment_*` iteration
(which is where `weight->scorer(...)` opens posting-list iterators) plus
the lazy null-bitmap fetch that follows.
- Move `InvertedIndexResultBitmap` construction, `mask_out_null()`, and the
DSL cache insert OUTSIDE SearchTime — still inside QueryTime — matching
how `FullTextIndexReader::query` does `cache->insert(...)` only after
`match_index_search` returns.
As a complementary diagnostic, `InvertedIndexFilterTime` now has three
sibling child timers in `SegmentIterator::_get_row_ranges_by_column_conditions`:
InvertedIndexApplyColPredTime — `_apply_inverted_index()`
InvertedIndexApplyExprTime — `_apply_index_expr()`
(contains InvertedIndexQueryTime)
InvertedIndexPostFilterTime — post-filter bitmap `&=` + per-column
`_check_all_conditions_passed_...`
so that FilterTime ≈ ApplyColPred + ApplyExpr + PostFilter holds by
construction, making the FilterTime ↔ QueryTime gap always explainable
from the profile alone.
Also in `VSearchExpr::evaluate_inverted_index`:
- Demote the per-segment `LOG(INFO)` / `LOG(WARNING)` on benign early-exit
paths to `VLOG_DEBUG`. At production scan rates these add measurable
cumulative wall-clock to FilterTime while contributing nothing useful
at INFO level.
- Pass the result bitmap to `set_index_result_for_expr` via `std::move`
to avoid a per-segment deep copy of the underlying roaring bitmap.
New unit test `segment_iterator_filter_sub_timer_test.cpp` pins both
invariants by running `SCOPED_RAW_TIMER` on the `OlapReaderStatistics`
fields in the same nesting shape as the production code and asserting
`parent ≥ sum(children)` with a sub-ms drift bound. Any future regression
that re-nests OpenTime inside SearchTime or drops a sub-timer around the
scorer-construction phase will flip these assertions.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
…uption via op_not aliasing
The previous commit included a "micro-optimization" in
`VSearchExpr::evaluate_inverted_index`:
index_context->set_index_result_for_expr(this, std::move(result_bitmap));
intending to skip a per-segment deep copy of the underlying roaring bitmap.
This is wrong: at the point of that call, the DSL result cache
(`InvertedIndexQueryCache`) already holds a `shared_ptr` to the same
underlying `roaring::Roaring` as `result_bitmap._data_bitmap`
(`function_search.cpp::evaluate_inverted_index_with_search_param` inserts
into the cache *before* the function returns). The previous lvalue
`set_index_result_for_expr(this, result_bitmap)` worked precisely because
the parameter is `InvertedIndexResultBitmap` by value, so its copy
constructor deep-copies the Roaring and isolates the per-segment expr-
context map entry from the cache.
With `std::move`, the per-segment map entry instead aliases the cached
Roaring. `VCompoundPred::COMPOUND_NOT` then calls
`InvertedIndexResultBitmap::op_not()`, which is declared `const` but
mutates `*_data_bitmap` in place (the const-ness only applies to the
shared_ptr member, not the pointee). That mutation corrupts the DSL
cache entry, so when the *next* segment hits the cache it observes the
*negated* roaring instead of the original SEARCH match. The same hazard
exists for `op_and` / `op_or` (both also mutate through the shared_ptr).
Symptoms — apache#62301 CI, both
`Doris_DorisRegression_P0Regression` and `Doris_DorisRegression_CloudP0`,
4 new test failures for the queries that go through compound NOT/AND
on top of `search()`:
- search/test_search_null_regression (line 115:
`WHERE not search('content:Round', '{"mode":"standard"}')`)
- search/test_search_dsl_syntax (line 305: NOT search
+ IS NOT NULL)
- search/test_search_vs_match_consistency (line 200: AND with NULL)
- search/test_search_not_null_bitmap (line 79: NOT search
+ count agg)
For each, the test got rows that the SEARCH side clearly matches (e.g.
row 3 of `search_null_regression_test`, whose `content` is "A story about
Ronald McDonald at the Round table"), proving the cached search result
had been negated by an earlier compound NOT.
Reverting to the lvalue form restores deep-copy isolation between the
DSL cache and the per-segment expr-context bitmap. Verified on a local
cluster: after reverting, `WHERE not search('content:Round', ...)`
returns the expected rows {1, 2, 5, 11, 13} and stays stable across
repeated invocations (previously it drifted as the cache got corrupted
on segment-2+).
Note: this `op_not`/`op_and`/`op_or` mutation pattern through a `const`
shared_ptr accessor is itself fragile. A separate, larger refactor to
make those operators non-const (or to deep-copy in the operator) would
be the right long-term fix; this patch is the minimal revert that
restores correctness without changing the operator API.
|
run buildall The previous CI run flagged 4 "new" failures in
Root cause: the previous commit included a Fixed in b53104b: revert to the lvalue form, so the value-by-copy parameter of Verified locally: |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
The SEARCH path (
FunctionSearch::evaluate_inverted_index_with_search_param) currently lays out its profile sub-timers in a way that breaks two invariants that already hold in the MATCH path (FullTextIndexReader::query):Observed on a production SEARCH query scanning 237 segments (cumulative):
SearchTime + OpenTime = 14310ms, far exceedsQueryTime = 8617ms. Root cause:OpenTimeis scoped insideFieldReaderResolver::resolve, which is called frombuild_query_recursive, which was itself wrapped bySearchInitTime→SearchTime. So OpenTime was nested inside SearchTime, not a sibling, and adding them as siblings double-counts the open phase.SearchTime − SearchInitTime − SearchExecTime ≈ 2.9 sof completely untimed work. Root cause:weight->scorer(...)— which is where per-term posting-list iterators are actually opened — lived between the old SearchInitTime (only wrappingbuild_query_recursive) and SearchExecTime (only wrapping the result-collection loop), so it was not attributed to any sub-timer.This PR restructures the execution path in
evaluate_inverted_index_with_search_paramto mirror the MATCH path semantics:QueryTimewraps everything; insideQueryTime,OpenTimeandSearchTimeare siblings; insideSearchTime,SearchInitTimeandSearchExecTimeare siblings.Changes in
be/src/exprs/function/function_search.cpp:build_query_recursive(...)outside thesearch_timerscope. It still opens searchers viaFieldReaderResolver::resolve, soOpenTimeaccumulates as before — it is simply no longer nested insideSearchTime.SearchInitTimeto wrapQuery::weight(...)— the SEARCH analog ofquery->add(query_info)in MATCH: preparing the executable query structure.SearchExecTimeto wrap thecollect_multi_segment_*iteration (whereweight->scorer(...)opens posting-list iterators) plus the lazy null-bitmap fetch that follows.InvertedIndexResultBitmapconstruction,mask_out_null(), and the DSL cache insert outsideSearchTime— still insideQueryTime— mirroring howFullTextIndexReader::querydoescache->insert(...)only aftermatch_index_searchreturns.Complementary diagnostic in
SegmentIterator::_get_row_ranges_by_column_conditions:InvertedIndexFilterTimenow has three sibling child timers so thatFilterTime ≈ ApplyColPred + ApplyExpr + PostFilterholds by construction, making theFilterTime ↔ QueryTimegap always explainable from the profile alone:InvertedIndexApplyColPredTime— wraps_apply_inverted_index()InvertedIndexApplyExprTime— wraps_apply_index_expr()(containsInvertedIndexQueryTime)InvertedIndexPostFilterTime— wraps post-filter bitmap&=+ per-column_check_all_conditions_passed_inverted_index_for_columnAlso in
be/src/exprs/vsearch.cpp(VSearchExpr::evaluate_inverted_index):LOG(INFO)/LOG(WARNING)on benign early-exit paths toVLOG_DEBUG. At production scan rates these add measurable cumulative wall-clock toFilterTimewhile contributing nothing useful at INFO level.set_index_result_for_exprviastd::moveto avoid a per-segment deep copy of the underlying roaring bitmap.Verification:
After the fix, small-test profile now shows:
Projected on the production profile:
QueryTime ≈ OpenTime (5.7s) + SearchTime (2.9s) ≈ 8.6s, invariant (a) restored;SearchTime ≈ SearchInitTime (2.9s) + SearchExecTime (~4ms), invariant (b) restored; the 2.9 s of previously-untimed scorer construction is now attributed toSearchInitTime.Release note
Fix
InvertedIndexSearcher*Timeprofile sub-timer hierarchy in thesearch()function path so thatQueryTime ≈ OpenTime + SearchTimeandSearchTime ≈ SearchInitTime + SearchExecTime, matching MATCH-path semantics. Also add three sub-timers (InvertedIndexApplyColPredTime,InvertedIndexApplyExprTime,InvertedIndexPostFilterTime) underInvertedIndexFilterTimeso that the filter block's wall time is fully attributed.Check List (For Author)
Test
Unit test added:
be/test/storage/segment/segment_iterator_filter_sub_timer_test.cpp— pins both invariants by runningSCOPED_RAW_TIMERon theOlapReaderStatisticsfields in the same nesting shape as the production code and assertingparent ≥ sum(children)with a sub-ms drift bound. Any future regression that re-nestsOpenTimeinsideSearchTimeor drops a sub-timer around the scorer-construction phase will flip these assertions.Manual test: rebuilt BE, deployed to a local cloud-sim cluster, ran
SELECT id FROM t_search_metrics WHERE search('content:doris') ORDER BY idwithenable_profile=true profile_level=2, verified the new counters appear and thatQueryTime ≈ OpenTime + SearchTimeandSearchTime ≈ SearchInitTime + SearchExecTime.Behavior changed:
This PR only changes profile-timer scope placement and two non-functional micro-optimizations (LOG → VLOG_DEBUG, one
std::move). The query result is unchanged.Does this need documentation?