[SPARK-57386][SQL] Render nanosecond timestamp types in HiveResult through the Types Framework by MaxGekk · Pull Request #56513 · apache/spark

MaxGekk · 2026-06-15T09:29:17Z

What changes were proposed in this pull request?

Unify nanosecond timestamp (TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p)) rendering in HiveResult onto the Types Framework, removing the inline duplicate renderer.

TimestampNanosTypeApiOps: remove the formatExternal(value, nested) = None override so the Hive path shares each subclass's single-arg formatExternal renderer (the same one Row JSON uses). nested does not affect timestamp formatting.
HiveResult.toHiveStringDefault: remove the inline TimestampLTZNanosType / TimestampNTZNanosType cases. The legacy path keeps no nanos handling, so a nanos value that somehow reaches it (only possible with the framework off, which the gating forbids) is unsupported rather than silently rendered.
TypeApiOps: update the two-arg formatExternal scaladoc to reflect that Hive now shares the single-arg renderer.

Why are the changes needed?

The nanosecond timestamp types are a Types Framework feature, implemented solely through it. External-value rendering for the framework is centralized in TypeApiOps.formatExternal, which already backs Row JSON (Row.json / Row.prettyJson).

The nanos ops previously overrode the two-arg formatExternal(value, nested) to return None, so HiveResult rendered nanos through inline pattern-matching in toHiveStringDefault. That duplicated the formatter logic and was documented in code as a temporary split "until nanos external rendering is unified across the zone-less (Row JSON) and zone-aware (Hive) paths".

The types are gated by timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled, so a nanos column cannot exist while the framework is off; the inline cases are therefore dead code in that mode and redundant when the framework is on.

Does this PR introduce any user-facing change?

No. Nanos Hive output is identical: zone-aware LTZ, zone-independent NTZ, precision flooring, and trailing-zero trimming are all unchanged. Both the old inline path and the framework path ultimately call the same TimestampFormatter methods; the pre-flooring in the inline path is a no-op because values reaching Hive come from executeCollectPublic(), whose internal value is already stored floored to the column precision.

How was this patch tested?

Existing HiveResultSuite SPARK-57257 tests cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, and session-zone vs zone-independent rendering, and now exercise the framework path. Also ran TimestampNanosTypeOpsSuite and RowJsonSuite. All pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk · 2026-06-15T09:31:43Z

@davidm-db @stevomitric Could you review this PR, please.

davidm-db · 2026-06-15T13:45:30Z

+  // Both external-value consumers share the single-arg formatExternal each subclass overrides:
+  //   - Row JSON (Row.json / Row.prettyJson) holds the external Row value (java.time.Instant for
+  //     LTZ, java.time.LocalDateTime for NTZ).
+  //   - The Hive result path (HiveResult.toHiveString) calls the two-arg overload, whose default
+  //     delegates to the single-arg renderer; `nested` does not affect timestamp formatting.
+  // Each subclass renders the external value through the same formatter as its zone-aware
+  // cast-to-string, so both paths show the nanosecond value rather than silently truncating to
+  // microseconds via the legacy path.


do we even need this comment now - it's free floating and to a random person reading the code it wouldn't refer to anything concrete, especially considering that we removed the actual override?

Good point - it was dangling after the override was removed. I dropped the free-floating comment and folded the Row JSON + Hive shared-renderer explanation into the class scaladoc, where it's anchored to a declaration. Done in 564364b.

…rough the Types Framework ### What changes were proposed in this pull request? Unify nanosecond timestamp (`TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`) rendering in `HiveResult` onto the Types Framework, removing the inline duplicate renderer. - `TimestampNanosTypeApiOps`: remove the `formatExternal(value, nested) = None` override so the Hive path shares each subclass's single-arg `formatExternal` renderer (the same one Row JSON uses). `nested` does not affect timestamp formatting. - `HiveResult.toHiveStringDefault`: remove the inline `TimestampLTZNanosType` / `TimestampNTZNanosType` cases. - `TypeApiOps`: update the two-arg `formatExternal` scaladoc to reflect that Hive now shares the single-arg renderer. ### Why are the changes needed? The nanos types are a Types Framework feature. The inline cases duplicated the framework formatter logic and were documented as a temporary split until nanos external rendering was unified across the zone-less (Row JSON) and zone-aware (Hive) paths. They are gated by `timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled`, so a nanos column cannot exist while the framework is off, making the inline cases dead code in that mode and redundant when the framework is on. ### Does this PR introduce _any_ user-facing change? No. Nanos Hive output is identical (zone-aware LTZ, zone-independent NTZ, precision flooring, trailing-zero trimming). ### How was this patch tested? Existing `HiveResultSuite` SPARK-57257 tests (precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, session-zone vs zone-independent) now exercise the framework path and pass unchanged. Also ran `TimestampNanosTypeOpsSuite` and `RowJsonSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8)

Address review: the comment above toSQLValue was free-floating after the formatExternal(value, nested) override was removed. Drop it and fold the Row JSON + Hive shared-renderer explanation into the class scaladoc, where it is anchored to a declaration.

MaxGekk · 2026-06-16T08:25:04Z

All tests passed: https://github.com/MaxGekk/spark/runs/81612414862
Merging to master/4.x. Thank you, @HyukjinKwon for review.

…rough the Types Framework ### What changes were proposed in this pull request? Unify nanosecond timestamp (`TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`) rendering in `HiveResult` onto the Types Framework, removing the inline duplicate renderer. - `TimestampNanosTypeApiOps`: remove the `formatExternal(value, nested) = None` override so the Hive path shares each subclass's single-arg `formatExternal` renderer (the same one Row JSON uses). `nested` does not affect timestamp formatting. - `HiveResult.toHiveStringDefault`: remove the inline `TimestampLTZNanosType` / `TimestampNTZNanosType` cases. The legacy path keeps no nanos handling, so a nanos value that somehow reaches it (only possible with the framework off, which the gating forbids) is unsupported rather than silently rendered. - `TypeApiOps`: update the two-arg `formatExternal` scaladoc to reflect that Hive now shares the single-arg renderer. ### Why are the changes needed? The nanosecond timestamp types are a Types Framework feature, implemented solely through it. External-value rendering for the framework is centralized in `TypeApiOps.formatExternal`, which already backs Row JSON (`Row.json` / `Row.prettyJson`). The nanos ops previously overrode the two-arg `formatExternal(value, nested)` to return `None`, so `HiveResult` rendered nanos through inline pattern-matching in `toHiveStringDefault`. That duplicated the formatter logic and was documented in code as a temporary split "until nanos external rendering is unified across the zone-less (Row JSON) and zone-aware (Hive) paths". The types are gated by `timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled`, so a nanos column cannot exist while the framework is off; the inline cases are therefore dead code in that mode and redundant when the framework is on. ### Does this PR introduce _any_ user-facing change? No. Nanos Hive output is identical: zone-aware LTZ, zone-independent NTZ, precision flooring, and trailing-zero trimming are all unchanged. Both the old inline path and the framework path ultimately call the same `TimestampFormatter` methods; the pre-flooring in the inline path is a no-op because values reaching Hive come from `executeCollectPublic()`, whose internal value is already stored floored to the column precision. ### How was this patch tested? Existing `HiveResultSuite` SPARK-57257 tests cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, and session-zone vs zone-independent rendering, and now exercise the framework path. Also ran `TimestampNanosTypeOpsSuite` and `RowJsonSuite`. All pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56513 from MaxGekk/nanos-update-HiveResult. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit f49ca14) Signed-off-by: Max Gekk <max.gekk@gmail.com>

davidm-db approved these changes Jun 15, 2026

View reviewed changes

HyukjinKwon approved these changes Jun 15, 2026

View reviewed changes

MaxGekk added 3 commits June 16, 2026 01:34

fix coding style

513ecc9

MaxGekk force-pushed the nanos-update-HiveResult branch from 564364b to d1ff196 Compare June 15, 2026 23:39

MaxGekk closed this in f49ca14 Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57386][SQL] Render nanosecond timestamp types in HiveResult through the Types Framework#56513

[SPARK-57386][SQL] Render nanosecond timestamp types in HiveResult through the Types Framework#56513
MaxGekk wants to merge 3 commits into
apache:masterfrom
MaxGekk:nanos-update-HiveResult

MaxGekk commented Jun 15, 2026

Uh oh!

MaxGekk commented Jun 15, 2026

Uh oh!

davidm-db Jun 15, 2026

Uh oh!

MaxGekk Jun 15, 2026

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MaxGekk commented Jun 15, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 15, 2026

Uh oh!

davidm-db Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants