[refactor](be) Derive get_storage_field_type from primitive type#64341
Open
csun5285 wants to merge 1 commit into
Open
[refactor](be) Derive get_storage_field_type from primitive type#64341csun5285 wants to merge 1 commit into
csun5285 wants to merge 1 commit into
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Make IDataType::get_storage_field_type() a non-pure virtual with a base implementation that derives the storage field type from the primitive type via TabletColumn::get_field_type_by_type(get_primitive_type()), and drop the ~17 trivial 1:1 overrides that just duplicated that mapping. Only the data types whose storage field type genuinely diverges from the primitive->field mapping keep an override: - DataTypeNullable: delegates to the nested type (e.g. nullable(decimalv2)) - DataTypeDecimal: DECIMALV2 -> OLAP_FIELD_TYPE_DECIMAL - DataTypeNothing / DataTypeFixedLengthObject: -> OLAP_FIELD_TYPE_NONE - DataTypeVarbinary: not implemented (throws) DataTypeString no longer collapses CHAR/VARCHAR into STRING; a char column now honestly reports OLAP_FIELD_TYPE_CHAR (varchar -> VARCHAR). Every call site tolerates this: inverted index uses is_string_type() (covers all three), get_predicate_column_ptr maps VARCHAR and STRING to the same predicate column, _can_evaluated_by_vectorized treats CHAR/VARCHAR/STRING together, and zone map dispatches via the data type serde. Because get_storage_field_type() now returns OLAP_FIELD_TYPE_CHAR for char columns directly, the SegmentIterator::_is_char_type machinery (and _vec_init_char_column_id) is dead and removed: its only consumer forced CHAR for char columns, which the derived field type already does. Variant subcolumns can never be CHAR (Type.variantSubTypes only allows STRING), so no case relies on the old guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
TPC-H: Total hot run time: 28807 ms |
7b73b6d to
a9c39c0
Compare
Contributor
Author
|
run buildall |
Contributor
TPC-DS: Total hot run time: 170016 ms |
Contributor
TPC-H: Total hot run time: 30301 ms |
Contributor
TPC-DS: Total hot run time: 169761 ms |
Contributor
Author
|
/review |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
There was a problem hiding this comment.
Review result: no blocking issues found in the live PR diff. From this review, the change looks approvable.
Critical checkpoint conclusions:
- Goal and scope: The PR is narrowly scoped to deriving default
IDataType::get_storage_field_type()fromPrimitiveType, removing equivalent overrides, and simplifying the CHAR predicate-column path inSegmentIterator. - Correctness and compatibility: I checked the removed overrides against
TabletColumn::get_field_type_by_type(). The kept overrides cover the non-1:1 cases such as nullable, decimalv2, nothing/fixed-length object, and varbinary. The CHAR/VARCHAR behavior change aligns with the existing tablet field-type mapping. - Parallel call paths: I reviewed the relevant consumers around segment predicate columns, vectorized predicate eligibility, inverted index string handling, zone-map dispatch, function-search support, and variant schema construction; I did not find a path that still depends on
DataTypeStringcollapsing CHAR/VARCHAR to STRING. - Concurrency, lifecycle, configuration, transactions, persistence, protocol, and observability: No new risk found; this refactor does not alter those behaviors directly.
- Performance: No hot-path regression found. The change removes redundant virtual overrides and a per-iterator CHAR bookkeeping vector.
- Tests: I did not run a full BE build or regression suite in this runner. The main residual non-blocking test gap is explicit coverage for CHAR/VARCHAR
get_storage_field_type()and predicate-column initialization after this refactor.
User focus: No additional user-provided review focus was present.
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)