Skip to content

feat(arrow/compute/exprs): support expr.IntervalYearToMonthLiteral in literalToDatum#825

Open
zeroshade wants to merge 1 commit into
apache:mainfrom
zeroshade:fix/815-interval-year-to-month-literal
Open

feat(arrow/compute/exprs): support expr.IntervalYearToMonthLiteral in literalToDatum#825
zeroshade wants to merge 1 commit into
apache:mainfrom
zeroshade:fix/815-interval-year-to-month-literal

Conversation

@zeroshade
Copy link
Copy Markdown
Member

Rationale for this change

Closes #815.

arrow/compute/exprs.literalToDatum handled interval year/month literals only through the *expr.ProtoLiteral path. The substrait-go library also surfaces these as the dedicated expr.IntervalYearToMonthLiteral type — that is the value intervalYearToMonthLiteralFromProto returns when deserializing protobuf — and any caller reaching literalToDatum with this type previously fell through to arrow.ErrNotImplemented.

What changes are included in this PR?

  1. New type-switch cases for expr.IntervalYearToMonthLiteral (value form, the canonical form returned by substrait-go's deserializer) and *expr.IntervalYearToMonthLiteral (pointer form, used in parts of substrait-go itself, e.g. expr/string_test.go:86). Both produce the same FixedSizeList<int32>[2] extension scalar that the existing *expr.ProtoLiteral path already produces, so callers reaching either form get a Datum that is compute.Equals to the one they would have gotten via the protobuf path.
  2. New helper intervalYearToMonthDatum(mem, years, months). The existing *types.IntervalYearToMonthType branch inside the *expr.ProtoLiteral case now calls this helper instead of inlining the builder logic, so the two paths cannot drift.
  3. Allocator threading. The helper takes the mem memory.Allocator parameter that literalToDatum already receives, instead of hardcoding memory.DefaultAllocator (which the original inlined branch did). All three call sites forward mem.

Are these changes tested?

Yes. New TestLiteralToDatumIntervalYearToMonth in arrow/compute/exprs/exec_internal_test.go constructs a *expr.ProtoLiteral baseline and a parallel expr.IntervalYearToMonthLiteral (value) and *expr.IntervalYearToMonthLiteral (pointer), then asserts the resulting Datums are compute.Equals. Existing parquet/file/..., parquet/pqarrow/..., and arrow/compute/exprs/... test suites all continue to pass; full repo go build ./... is clean.

The test deliberately uses memory.DefaultAllocator rather than memory.NewCheckedAllocator with AssertSize(t, 0). *scalar.Extension does not implement Release() (see arrow/scalar/scalar.go:397-466), so an extension scalar's underlying storage is never released even when the wrapping Datum is — adding a checked-allocator assertion would chase a pre-existing leak that is unrelated to this issue. (Filing that as a separate follow-up issue is straightforward and out of scope here.)

Are there any user-facing changes?

Yes, but purely additive:

  • Callers passing an expr.IntervalYearToMonthLiteral (or pointer to one) to literalToDatum, ExecuteScalarExpression, or related entry points used to receive arrow.ErrNotImplemented; they now receive a properly-formed extension Datum identical to what the *expr.ProtoLiteral path produces.
  • No public API signatures change.
  • No existing call site behavior changes (the *expr.ProtoLiteral path produces a byte-identical Datum to before).

Out of scope (separate follow-up)

*scalar.Extension does not implement Release(), so extension scalars (including the ones produced by this code path, the existing *types.IntervalYearToMonthType path, the *types.IntervalDayType path, and intervalDay() / intervalYear() extension types in general) leak their storage through compute.ScalarDatum.Release(). Worth a separate issue.

… literalToDatum (apacheGH-815)

literalToDatum handled interval year/month literals only through the
*expr.ProtoLiteral path. The substrait-go library also surfaces these
as the dedicated expr.IntervalYearToMonthLiteral type (returned by its
intervalYearToMonthLiteralFromProto deserializer), which previously
fell through to arrow.ErrNotImplemented.

Add type-switch cases for both the value form (the canonical form
returned by substrait-go's deserializer) and the pointer form (used in
parts of substrait-go itself) so callers reaching either form get the
same FixedSizeList<int32>[2] extension scalar produced by the existing
ProtoLiteral path.

Refactor the ProtoLiteral *types.IntervalYearToMonthType branch to
share the new intervalYearToMonthDatum helper so the two paths cannot
drift.

Add an internal test that asserts the new value and pointer paths
produce a Datum equal to the one produced by the existing ProtoLiteral
baseline.

Closes apache#815.
@zeroshade zeroshade force-pushed the fix/815-interval-year-to-month-literal branch from ee46f86 to 8b9590a Compare May 22, 2026 20:18
@zeroshade zeroshade requested review from amoeba, kou and lidavidm May 22, 2026 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support expr.IntervalYearToMonthLiteral in literalToDatum

1 participant