Skip to content

[refactor](be) Derive get_storage_field_type from primitive type#64341

Open
csun5285 wants to merge 1 commit into
apache:masterfrom
csun5285:refactor/get-storage-field-type
Open

[refactor](be) Derive get_storage_field_type from primitive type#64341
csun5285 wants to merge 1 commit into
apache:masterfrom
csun5285:refactor/get-storage-field-type

Conversation

@csun5285

@csun5285 csun5285 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
  1. Make IDataType::get_storage_field_type() derive the field type from the primitive type
  2. SegmentIterator::_is_char_type (and _vec_init_char_column_id) is deleted

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285

Copy link
Copy Markdown
Contributor Author

run buildall

Make IDataType::get_storage_field_type() a non-pure virtual with a base
implementation that derives the storage field type from the primitive
type via TabletColumn::get_field_type_by_type(get_primitive_type()), and
drop the ~17 trivial 1:1 overrides that just duplicated that mapping.

Only the data types whose storage field type genuinely diverges from the
primitive->field mapping keep an override:
- DataTypeNullable: delegates to the nested type (e.g. nullable(decimalv2))
- DataTypeDecimal: DECIMALV2 -> OLAP_FIELD_TYPE_DECIMAL
- DataTypeNothing / DataTypeFixedLengthObject: -> OLAP_FIELD_TYPE_NONE
- DataTypeVarbinary: not implemented (throws)

DataTypeString no longer collapses CHAR/VARCHAR into STRING; a char column
now honestly reports OLAP_FIELD_TYPE_CHAR (varchar -> VARCHAR). Every call
site tolerates this: inverted index uses is_string_type() (covers all three),
get_predicate_column_ptr maps VARCHAR and STRING to the same predicate column,
_can_evaluated_by_vectorized treats CHAR/VARCHAR/STRING together, and zone map
dispatches via the data type serde.

Because get_storage_field_type() now returns OLAP_FIELD_TYPE_CHAR for char
columns directly, the SegmentIterator::_is_char_type machinery (and
_vec_init_char_column_id) is dead and removed: its only consumer forced CHAR
for char columns, which the derived field type already does. Variant
subcolumns can never be CHAR (Type.variantSubTypes only allows STRING), so no
case relies on the old guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 28807 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7b73b6de3c030c035781816abe298fd90a434c80, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17725	4019	3950	3950
q2	q3	10760	1394	829	829
q4	4717	478	346	346
q5	7810	851	595	595
q6	220	178	135	135
q7	845	840	637	637
q8	10580	1660	1481	1481
q9	6982	4494	4508	4494
q10	6813	1863	1542	1542
q11	435	274	247	247
q12	655	429	287	287
q13	18160	3357	2771	2771
q14	265	260	240	240
q15	q16	825	802	714	714
q17	971	873	869	869
q18	7186	5904	5549	5549
q19	1172	1229	1123	1123
q20	509	414	281	281
q21	5781	2667	2416	2416
q22	445	363	301	301
Total cold run time: 102856 ms
Total hot run time: 28807 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4332	4231	4239	4231
q2	q3	4509	5018	4345	4345
q4	2071	2206	1415	1415
q5	4414	4301	4493	4301
q6	232	172	128	128
q7	1827	2018	1658	1658
q8	2547	2144	2053	2053
q9	7999	7943	7979	7943
q10	4811	4794	4328	4328
q11	568	428	392	392
q12	855	860	559	559
q13	3285	3664	3019	3019
q14	314	304	316	304
q15	q16	712	757	627	627
q17	1355	1322	1313	1313
q18	8231	7511	6893	6893
q19	1080	1085	1094	1085
q20	2222	2220	1974	1974
q21	5281	4586	4405	4405
q22	500	456	390	390
Total cold run time: 57145 ms
Total hot run time: 51363 ms

@csun5285 csun5285 force-pushed the refactor/get-storage-field-type branch from 7b73b6d to a9c39c0 Compare June 10, 2026 04:54
@csun5285

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 170016 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7b73b6de3c030c035781816abe298fd90a434c80, data reload: false

query5	4328	625	486	486
query6	426	195	181	181
query7	4816	576	313	313
query8	365	213	200	200
query9	8770	4017	4025	4017
query10	450	315	262	262
query11	5906	2369	2135	2135
query12	168	107	101	101
query13	1291	634	421	421
query14	6421	5396	5033	5033
query14_1	4363	4405	4371	4371
query15	208	200	183	183
query16	1042	481	441	441
query17	1132	715	592	592
query18	2560	488	360	360
query19	208	190	153	153
query20	115	110	113	110
query21	214	141	119	119
query22	13600	13570	13345	13345
query23	17519	16472	16207	16207
query23_1	16399	16330	16295	16295
query24	7574	1782	1328	1328
query24_1	1337	1324	1327	1324
query25	597	498	422	422
query26	1300	325	167	167
query27	2690	588	347	347
query28	4468	2036	2004	2004
query29	1099	619	514	514
query30	311	235	199	199
query31	1123	1078	955	955
query32	120	66	65	65
query33	530	331	254	254
query34	1201	1128	673	673
query35	788	781	684	684
query36	1434	1412	1224	1224
query37	150	103	89	89
query38	3212	3134	3064	3064
query39	935	933	894	894
query39_1	902	887	885	885
query40	213	125	99	99
query41	70	63	63	63
query42	96	94	95	94
query43	327	330	275	275
query44	
query45	195	188	181	181
query46	1094	1198	748	748
query47	2413	2380	2269	2269
query48	413	409	293	293
query49	621	469	362	362
query50	1048	360	253	253
query51	4417	4307	4268	4268
query52	88	90	76	76
query53	244	266	191	191
query54	272	217	210	210
query55	78	74	72	72
query56	232	218	215	215
query57	1417	1415	1344	1344
query58	245	212	215	212
query59	1557	1643	1484	1484
query60	297	255	227	227
query61	169	160	159	159
query62	707	646	595	595
query63	241	192	181	181
query64	2565	814	662	662
query65	
query66	1755	462	339	339
query67	29745	29648	29612	29612
query68	
query69	418	313	291	291
query70	985	910	961	910
query71	303	216	210	210
query72	3055	2744	2421	2421
query73	911	749	400	400
query74	5117	4968	4793	4793
query75	2666	2614	2258	2258
query76	2308	1206	761	761
query77	356	387	292	292
query78	12507	12648	11922	11922
query79	1458	1035	793	793
query80	1289	484	387	387
query81	531	280	238	238
query82	626	154	120	120
query83	337	273	250	250
query84	
query85	953	547	440	440
query86	422	308	273	273
query87	3457	3352	3219	3219
query88	3692	2750	2736	2736
query89	417	389	328	328
query90	1915	174	181	174
query91	181	165	138	138
query92	60	61	57	57
query93	1627	1498	882	882
query94	701	350	321	321
query95	716	400	346	346
query96	1092	785	326	326
query97	2753	2722	2562	2562
query98	213	211	204	204
query99	1179	1220	1084	1084
Total cold run time: 253177 ms
Total hot run time: 170016 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 30301 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a9c39c0d31c927b7b424897f45f28fadef9f8493, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17658	4339	4323	4323
q2	q3	10741	1496	864	864
q4	4693	509	360	360
q5	7561	889	594	594
q6	188	187	149	149
q7	815	870	658	658
q8	9382	1566	1638	1566
q9	5880	4551	4538	4538
q10	6829	1886	1550	1550
q11	453	286	266	266
q12	639	474	323	323
q13	18133	3610	2774	2774
q14	286	268	253	253
q15	q16	832	814	721	721
q17	1014	961	1019	961
q18	7097	5814	5727	5727
q19	2155	1412	1113	1113
q20	554	416	366	366
q21	6615	2882	2858	2858
q22	488	405	337	337
Total cold run time: 102013 ms
Total hot run time: 30301 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5599	5186	5180	5180
q2	q3	5042	5566	4742	4742
q4	2393	2439	1567	1567
q5	5179	4995	4840	4840
q6	261	196	145	145
q7	2148	1925	1675	1675
q8	2724	2342	2402	2342
q9	8214	7737	7640	7640
q10	4972	4865	4386	4386
q11	592	428	419	419
q12	786	790	558	558
q13	3023	3525	2799	2799
q14	284	284	253	253
q15	q16	708	719	629	629
q17	1326	1296	1291	1291
q18	7545	6783	6970	6783
q19	1109	1086	1104	1086
q20	2273	2246	1973	1973
q21	5650	4954	4822	4822
q22	537	487	423	423
Total cold run time: 60365 ms
Total hot run time: 53553 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169761 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a9c39c0d31c927b7b424897f45f28fadef9f8493, data reload: false

query5	4310	649	464	464
query6	472	204	176	176
query7	4880	577	311	311
query8	366	223	215	215
query9	8787	4089	4099	4089
query10	453	316	260	260
query11	5928	2383	2205	2205
query12	154	104	99	99
query13	1327	626	437	437
query14	6419	5404	5120	5120
query14_1	4453	4447	4463	4447
query15	211	203	175	175
query16	1013	461	444	444
query17	968	724	593	593
query18	2460	485	349	349
query19	203	193	148	148
query20	113	114	111	111
query21	217	146	120	120
query22	13642	13562	13395	13395
query23	17393	16392	16228	16228
query23_1	16307	16307	16404	16307
query24	7527	1769	1326	1326
query24_1	1326	1325	1318	1318
query25	570	469	408	408
query26	1295	316	168	168
query27	2704	539	357	357
query28	4529	2020	2052	2020
query29	1113	674	514	514
query30	315	233	204	204
query31	1172	1076	968	968
query32	108	65	63	63
query33	533	328	260	260
query34	1179	1149	691	691
query35	753	797	692	692
query36	1396	1387	1235	1235
query37	154	105	89	89
query38	3223	3141	3082	3082
query39	930	913	921	913
query39_1	876	870	876	870
query40	225	123	101	101
query41	67	64	63	63
query42	95	94	96	94
query43	321	325	282	282
query44	
query45	202	194	181	181
query46	1109	1232	719	719
query47	2330	2375	2249	2249
query48	389	406	281	281
query49	617	474	356	356
query50	1037	352	272	272
query51	4425	4355	4296	4296
query52	90	87	76	76
query53	258	264	193	193
query54	270	219	222	219
query55	86	75	69	69
query56	232	232	229	229
query57	1430	1402	1332	1332
query58	240	222	215	215
query59	1634	1675	1415	1415
query60	292	251	233	233
query61	165	167	162	162
query62	693	647	599	599
query63	236	183	192	183
query64	2581	763	622	622
query65	
query66	1801	470	343	343
query67	29262	29724	29530	29530
query68	
query69	417	298	257	257
query70	925	991	986	986
query71	295	226	219	219
query72	3076	2679	2415	2415
query73	878	761	429	429
query74	5155	4990	4764	4764
query75	2691	2612	2236	2236
query76	2336	1127	787	787
query77	351	367	284	284
query78	12565	12400	11860	11860
query79	1441	1018	744	744
query80	597	462	396	396
query81	459	286	245	245
query82	602	159	123	123
query83	357	282	252	252
query84	
query85	901	529	444	444
query86	374	300	276	276
query87	3394	3383	3193	3193
query88	3670	2752	2723	2723
query89	432	390	328	328
query90	1939	181	179	179
query91	179	163	136	136
query92	63	59	57	57
query93	1575	1448	868	868
query94	562	346	331	331
query95	675	473	346	346
query96	1075	834	335	335
query97	2697	2685	2557	2557
query98	215	210	206	206
query99	1149	1186	1034	1034
Total cold run time: 251557 ms
Total hot run time: 169761 ms

@csun5285

Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.87% (21124/39210)
Line Coverage 37.60% (201293/535327)
Region Coverage 33.64% (157886/469360)
Branch Coverage 34.68% (69130/199330)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.87% (21124/39210)
Line Coverage 37.60% (201273/535327)
Region Coverage 33.64% (157874/469360)
Branch Coverage 34.68% (69120/199330)

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found in the live PR diff. From this review, the change looks approvable.

Critical checkpoint conclusions:

  • Goal and scope: The PR is narrowly scoped to deriving default IDataType::get_storage_field_type() from PrimitiveType, removing equivalent overrides, and simplifying the CHAR predicate-column path in SegmentIterator.
  • Correctness and compatibility: I checked the removed overrides against TabletColumn::get_field_type_by_type(). The kept overrides cover the non-1:1 cases such as nullable, decimalv2, nothing/fixed-length object, and varbinary. The CHAR/VARCHAR behavior change aligns with the existing tablet field-type mapping.
  • Parallel call paths: I reviewed the relevant consumers around segment predicate columns, vectorized predicate eligibility, inverted index string handling, zone-map dispatch, function-search support, and variant schema construction; I did not find a path that still depends on DataTypeString collapsing CHAR/VARCHAR to STRING.
  • Concurrency, lifecycle, configuration, transactions, persistence, protocol, and observability: No new risk found; this refactor does not alter those behaviors directly.
  • Performance: No hot-path regression found. The change removes redundant virtual overrides and a per-iterator CHAR bookkeeping vector.
  • Tests: I did not run a full BE build or regression suite in this runner. The main residual non-blocking test gap is explicit coverage for CHAR/VARCHAR get_storage_field_type() and predicate-column initialization after this refactor.

User focus: No additional user-provided review focus was present.

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.91% (28305/38298)
Line Coverage 57.97% (308518/532226)
Region Coverage 54.84% (258607/471548)
Branch Coverage 56.20% (112176/199605)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants