[SPARK-57386][SQL] Render nanosecond timestamp types in HiveResult through the Types Framework#56513
Closed
MaxGekk wants to merge 3 commits into
Closed
[SPARK-57386][SQL] Render nanosecond timestamp types in HiveResult through the Types Framework#56513MaxGekk wants to merge 3 commits into
MaxGekk wants to merge 3 commits into
Conversation
Member
Author
|
@davidm-db @stevomitric Could you review this PR, please. |
davidm-db
approved these changes
Jun 15, 2026
Comment on lines
+61
to
+68
| // Both external-value consumers share the single-arg formatExternal each subclass overrides: | ||
| // - Row JSON (Row.json / Row.prettyJson) holds the external Row value (java.time.Instant for | ||
| // LTZ, java.time.LocalDateTime for NTZ). | ||
| // - The Hive result path (HiveResult.toHiveString) calls the two-arg overload, whose default | ||
| // delegates to the single-arg renderer; `nested` does not affect timestamp formatting. | ||
| // Each subclass renders the external value through the same formatter as its zone-aware | ||
| // cast-to-string, so both paths show the nanosecond value rather than silently truncating to | ||
| // microseconds via the legacy path. |
Contributor
There was a problem hiding this comment.
do we even need this comment now - it's free floating and to a random person reading the code it wouldn't refer to anything concrete, especially considering that we removed the actual override?
Member
Author
There was a problem hiding this comment.
Good point - it was dangling after the override was removed. I dropped the free-floating comment and folded the Row JSON + Hive shared-renderer explanation into the class scaladoc, where it's anchored to a declaration. Done in 564364b.
HyukjinKwon
approved these changes
Jun 15, 2026
…rough the Types Framework ### What changes were proposed in this pull request? Unify nanosecond timestamp (`TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`) rendering in `HiveResult` onto the Types Framework, removing the inline duplicate renderer. - `TimestampNanosTypeApiOps`: remove the `formatExternal(value, nested) = None` override so the Hive path shares each subclass's single-arg `formatExternal` renderer (the same one Row JSON uses). `nested` does not affect timestamp formatting. - `HiveResult.toHiveStringDefault`: remove the inline `TimestampLTZNanosType` / `TimestampNTZNanosType` cases. - `TypeApiOps`: update the two-arg `formatExternal` scaladoc to reflect that Hive now shares the single-arg renderer. ### Why are the changes needed? The nanos types are a Types Framework feature. The inline cases duplicated the framework formatter logic and were documented as a temporary split until nanos external rendering was unified across the zone-less (Row JSON) and zone-aware (Hive) paths. They are gated by `timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled`, so a nanos column cannot exist while the framework is off, making the inline cases dead code in that mode and redundant when the framework is on. ### Does this PR introduce _any_ user-facing change? No. Nanos Hive output is identical (zone-aware LTZ, zone-independent NTZ, precision flooring, trailing-zero trimming). ### How was this patch tested? Existing `HiveResultSuite` SPARK-57257 tests (precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, session-zone vs zone-independent) now exercise the framework path and pass unchanged. Also ran `TimestampNanosTypeOpsSuite` and `RowJsonSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8)
Address review: the comment above toSQLValue was free-floating after the formatExternal(value, nested) override was removed. Drop it and fold the Row JSON + Hive shared-renderer explanation into the class scaladoc, where it is anchored to a declaration.
564364b to
d1ff196
Compare
Member
Author
|
All tests passed: https://github.com/MaxGekk/spark/runs/81612414862 |
MaxGekk
added a commit
that referenced
this pull request
Jun 16, 2026
…rough the Types Framework ### What changes were proposed in this pull request? Unify nanosecond timestamp (`TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`) rendering in `HiveResult` onto the Types Framework, removing the inline duplicate renderer. - `TimestampNanosTypeApiOps`: remove the `formatExternal(value, nested) = None` override so the Hive path shares each subclass's single-arg `formatExternal` renderer (the same one Row JSON uses). `nested` does not affect timestamp formatting. - `HiveResult.toHiveStringDefault`: remove the inline `TimestampLTZNanosType` / `TimestampNTZNanosType` cases. The legacy path keeps no nanos handling, so a nanos value that somehow reaches it (only possible with the framework off, which the gating forbids) is unsupported rather than silently rendered. - `TypeApiOps`: update the two-arg `formatExternal` scaladoc to reflect that Hive now shares the single-arg renderer. ### Why are the changes needed? The nanosecond timestamp types are a Types Framework feature, implemented solely through it. External-value rendering for the framework is centralized in `TypeApiOps.formatExternal`, which already backs Row JSON (`Row.json` / `Row.prettyJson`). The nanos ops previously overrode the two-arg `formatExternal(value, nested)` to return `None`, so `HiveResult` rendered nanos through inline pattern-matching in `toHiveStringDefault`. That duplicated the formatter logic and was documented in code as a temporary split "until nanos external rendering is unified across the zone-less (Row JSON) and zone-aware (Hive) paths". The types are gated by `timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled`, so a nanos column cannot exist while the framework is off; the inline cases are therefore dead code in that mode and redundant when the framework is on. ### Does this PR introduce _any_ user-facing change? No. Nanos Hive output is identical: zone-aware LTZ, zone-independent NTZ, precision flooring, and trailing-zero trimming are all unchanged. Both the old inline path and the framework path ultimately call the same `TimestampFormatter` methods; the pre-flooring in the inline path is a no-op because values reaching Hive come from `executeCollectPublic()`, whose internal value is already stored floored to the column precision. ### How was this patch tested? Existing `HiveResultSuite` SPARK-57257 tests cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, and session-zone vs zone-independent rendering, and now exercise the framework path. Also ran `TimestampNanosTypeOpsSuite` and `RowJsonSuite`. All pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56513 from MaxGekk/nanos-update-HiveResult. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit f49ca14) Signed-off-by: Max Gekk <max.gekk@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Unify nanosecond timestamp (
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)) rendering inHiveResultonto the Types Framework, removing the inline duplicate renderer.TimestampNanosTypeApiOps: remove theformatExternal(value, nested) = Noneoverride so the Hive path shares each subclass's single-argformatExternalrenderer (the same one Row JSON uses).nesteddoes not affect timestamp formatting.HiveResult.toHiveStringDefault: remove the inlineTimestampLTZNanosType/TimestampNTZNanosTypecases. The legacy path keeps no nanos handling, so a nanos value that somehow reaches it (only possible with the framework off, which the gating forbids) is unsupported rather than silently rendered.TypeApiOps: update the two-argformatExternalscaladoc to reflect that Hive now shares the single-arg renderer.Why are the changes needed?
The nanosecond timestamp types are a Types Framework feature, implemented solely through it. External-value rendering for the framework is centralized in
TypeApiOps.formatExternal, which already backs Row JSON (Row.json/Row.prettyJson).The nanos ops previously overrode the two-arg
formatExternal(value, nested)to returnNone, soHiveResultrendered nanos through inline pattern-matching intoHiveStringDefault. That duplicated the formatter logic and was documented in code as a temporary split "until nanos external rendering is unified across the zone-less (Row JSON) and zone-aware (Hive) paths".The types are gated by
timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled, so a nanos column cannot exist while the framework is off; the inline cases are therefore dead code in that mode and redundant when the framework is on.Does this PR introduce any user-facing change?
No. Nanos Hive output is identical: zone-aware LTZ, zone-independent NTZ, precision flooring, and trailing-zero trimming are all unchanged. Both the old inline path and the framework path ultimately call the same
TimestampFormattermethods; the pre-flooring in the inline path is a no-op because values reaching Hive come fromexecuteCollectPublic(), whose internal value is already stored floored to the column precision.How was this patch tested?
Existing
HiveResultSuiteSPARK-57257 tests cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, and session-zone vs zone-independent rendering, and now exercise the framework path. Also ranTimestampNanosTypeOpsSuiteandRowJsonSuite. All pass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)