Skip to content

[Arrow] Extract ArrowToPinotTypeConverter shared utility#18632

Open
real-mj-song wants to merge 1 commit into
apache:masterfrom
real-mj-song:extract-arrow-type-converter
Open

[Arrow] Extract ArrowToPinotTypeConverter shared utility#18632
real-mj-song wants to merge 1 commit into
apache:masterfrom
real-mj-song:extract-arrow-type-converter

Conversation

@real-mj-song
Copy link
Copy Markdown
Contributor

@real-mj-song real-mj-song commented May 29, 2026

TL;DR: Precursor refactor extracting Arrow → Pinot type-conversion logic from ArrowRecordExtractor into a new public utility ArrowToPinotTypeConverter. Behavior-preserving on the row-major path; enables a column-major ColumnReader consumer to reuse the same conversion in a follow-on PR.

Tracks #18629. This is the precursor PR; a follow-on PR will add ArrowColumnReaderFactory / ArrowColumnReader on top of this branch and consume the extracted utility.

What changed

  • New: ArrowToPinotTypeConverter — public static utility holding the schema-driven dispatch on ArrowType.ArrowTypeID previously inlined in ArrowRecordExtractor. Returns canonical JDK types (String from Utf8 / LargeUtf8 unwrapped from Arrow's Text, Object[] for List variants, LocalDate / LocalTime / Timestamp for temporal types, BigDecimal for Decimal, etc.).
  • ArrowRecordExtractor.extract now delegates to ArrowToPinotTypeConverter.toPinotValue(field, value, extractRawTimeValues). The inline convert() and its helpers (convertTimestamp, convertDate, convertTime, convertList, convertMap, convertStruct, convertByRuntimeType, toEpochInUnit) are removed from the extractor.
  • No behavior change on the row-major path.

Testing

All 63 existing row-major tests pass unchanged:

  • ArrowRecordReaderTest (2 / 2)
  • ArrowRecordExtractorTest (54 / 54)
  • ArrowMessageDecoderTest (7 / 7)

ArrowRecordExtractorTest exhaustively exercises every converter branch (every Arrow logical type, all temporal units, the extractRawTimeValues bypass, list / struct / map recursion, dictionary encoding, nulls). The refactor green-bar on those tests proves behavior preservation.

Compatibility

Purely additive on the SPI side. Public API surface change: ArrowToPinotTypeConverter is a new public class; nothing existing is renamed or removed.

References

…TypeConverter

Pure refactor — no behavior change. Move the schema-driven Arrow → Pinot value
conversion (originally introduced in apache#18434) from ArrowRecordExtractor into a
new public static utility ArrowToPinotTypeConverter so it can be reused by
column-major ColumnReader implementations that wrap Arrow vectors.

ArrowToPinotTypeConverter exposes a single entry point:

    public static Object toPinotValue(Field field, Object value, boolean extractRawTimeValues)

with all per-type helpers (convertTimestamp / Date / Time / List / Map /
Struct / byRuntimeType) reduced to private static methods. The
extractRawTimeValues flag — previously an ArrowRecordExtractor instance
field — is threaded through as a method parameter so the converter remains
stateless.

ArrowRecordExtractor.convert(Field, Object) is removed; the per-row dispatch
in extract() now calls ArrowToPinotTypeConverter.toPinotValue() directly.
All existing tests in pinot-arrow continue to pass unchanged:

- ArrowRecordReaderTest (2 / 2)
- ArrowRecordExtractorTest (54 / 54)
- ArrowMessageDecoderTest (7 / 7)

This precursor lets a follow-up ColumnReader implementation reuse the
conversion without duplicating ~200 lines of schema-driven dispatch.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.40%. Comparing base (a762254) to head (fb38d72).

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18632   +/-   ##
=========================================
  Coverage     64.40%   64.40%           
+ Complexity     1137     1131    -6     
=========================================
  Files          3337     3337           
  Lines        206069   206069           
  Branches      32128    32128           
=========================================
+ Hits         132710   132716    +6     
+ Misses        62726    62723    -3     
+ Partials      10633    10630    -3     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <ø> (-100.00%) ⬇️
integration1 ?
integration2 0.00% <ø> (ø)
java-21 64.40% <ø> (+<0.01%) ⬆️
temurin 64.40% <ø> (+<0.01%) ⬆️
unittests 64.40% <ø> (+<0.01%) ⬆️
unittests1 56.83% <ø> (+0.01%) ⬆️
unittests2 36.92% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@real-mj-song
Copy link
Copy Markdown
Contributor Author

The single CI failure (Pinot Integration Test Set 1) is unrelated to this PR's diff.

Failing case: MultiNodesOfflineClusterIntegrationTest.testExplainPlanQueryV2 — asserts on EXPLAIN PLAN output for a multi-stage query. This PR touches only two files in pinot-plugins/pinot-input-format/pinot-arrow; nothing in query planning, multi-stage engine, or integration tests.

Assertion diff:

  • Expected: Rule: AggregateProjectPullUpConstants -> Time:*
  • Actual: Rule: AggregateProjectPullUpConstants -> Time:*E-4

The * is a literal placeholder that the test substitutes from real timings. The actual rule execution time rendered in scientific notation (sub-millisecond, e.g. 1.5E-4ms), and the substitution regex doesn't cover scientific notation — so it consumed only the mantissa and left E-4 behind. Timing-dependent.

Recent Pinot Integration Tests runs on master are passing, consistent with a timing-conditioned flake rather than a deterministic regression.

@real-mj-song real-mj-song marked this pull request as ready for review May 29, 2026 19:28
@real-mj-song real-mj-song changed the title [WIP] [Arrow] Extract ArrowToPinotTypeConverter shared utility [Arrow] Extract ArrowToPinotTypeConverter shared utility May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants