Skip to content

[SPARK-57386][SQL] Render nanosecond timestamp types in HiveResult through the Types Framework#56513

Closed
MaxGekk wants to merge 3 commits into
apache:masterfrom
MaxGekk:nanos-update-HiveResult
Closed

[SPARK-57386][SQL] Render nanosecond timestamp types in HiveResult through the Types Framework#56513
MaxGekk wants to merge 3 commits into
apache:masterfrom
MaxGekk:nanos-update-HiveResult

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 15, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Unify nanosecond timestamp (TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p)) rendering in HiveResult onto the Types Framework, removing the inline duplicate renderer.

  • TimestampNanosTypeApiOps: remove the formatExternal(value, nested) = None override so the Hive path shares each subclass's single-arg formatExternal renderer (the same one Row JSON uses). nested does not affect timestamp formatting.
  • HiveResult.toHiveStringDefault: remove the inline TimestampLTZNanosType / TimestampNTZNanosType cases. The legacy path keeps no nanos handling, so a nanos value that somehow reaches it (only possible with the framework off, which the gating forbids) is unsupported rather than silently rendered.
  • TypeApiOps: update the two-arg formatExternal scaladoc to reflect that Hive now shares the single-arg renderer.

Why are the changes needed?

The nanosecond timestamp types are a Types Framework feature, implemented solely through it. External-value rendering for the framework is centralized in TypeApiOps.formatExternal, which already backs Row JSON (Row.json / Row.prettyJson).

The nanos ops previously overrode the two-arg formatExternal(value, nested) to return None, so HiveResult rendered nanos through inline pattern-matching in toHiveStringDefault. That duplicated the formatter logic and was documented in code as a temporary split "until nanos external rendering is unified across the zone-less (Row JSON) and zone-aware (Hive) paths".

The types are gated by timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled, so a nanos column cannot exist while the framework is off; the inline cases are therefore dead code in that mode and redundant when the framework is on.

Does this PR introduce any user-facing change?

No. Nanos Hive output is identical: zone-aware LTZ, zone-independent NTZ, precision flooring, and trailing-zero trimming are all unchanged. Both the old inline path and the framework path ultimately call the same TimestampFormatter methods; the pre-flooring in the inline path is a no-op because values reaching Hive come from executeCollectPublic(), whose internal value is already stored floored to the column precision.

How was this patch tested?

Existing HiveResultSuite SPARK-57257 tests cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, and session-zone vs zone-independent rendering, and now exercise the framework path. Also ran TimestampNanosTypeOpsSuite and RowJsonSuite. All pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

@MaxGekk

MaxGekk commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

@davidm-db @stevomitric Could you review this PR, please.

Comment on lines +61 to +68
// Both external-value consumers share the single-arg formatExternal each subclass overrides:
// - Row JSON (Row.json / Row.prettyJson) holds the external Row value (java.time.Instant for
// LTZ, java.time.LocalDateTime for NTZ).
// - The Hive result path (HiveResult.toHiveString) calls the two-arg overload, whose default
// delegates to the single-arg renderer; `nested` does not affect timestamp formatting.
// Each subclass renders the external value through the same formatter as its zone-aware
// cast-to-string, so both paths show the nanosecond value rather than silently truncating to
// microseconds via the legacy path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need this comment now - it's free floating and to a random person reading the code it wouldn't refer to anything concrete, especially considering that we removed the actual override?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - it was dangling after the override was removed. I dropped the free-floating comment and folded the Row JSON + Hive shared-renderer explanation into the class scaladoc, where it's anchored to a declaration. Done in 564364b.

MaxGekk added 3 commits June 16, 2026 01:34
…rough the Types Framework

### What changes were proposed in this pull request?
Unify nanosecond timestamp (`TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`) rendering in
`HiveResult` onto the Types Framework, removing the inline duplicate renderer.

- `TimestampNanosTypeApiOps`: remove the `formatExternal(value, nested) = None` override so
  the Hive path shares each subclass's single-arg `formatExternal` renderer (the same one
  Row JSON uses). `nested` does not affect timestamp formatting.
- `HiveResult.toHiveStringDefault`: remove the inline `TimestampLTZNanosType` /
  `TimestampNTZNanosType` cases.
- `TypeApiOps`: update the two-arg `formatExternal` scaladoc to reflect that Hive now shares
  the single-arg renderer.

### Why are the changes needed?
The nanos types are a Types Framework feature. The inline cases duplicated the framework
formatter logic and were documented as a temporary split until nanos external rendering was
unified across the zone-less (Row JSON) and zone-aware (Hive) paths. They are gated by
`timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled`, so a
nanos column cannot exist while the framework is off, making the inline cases dead code in
that mode and redundant when the framework is on.

### Does this PR introduce _any_ user-facing change?
No. Nanos Hive output is identical (zone-aware LTZ, zone-independent NTZ, precision flooring,
trailing-zero trimming).

### How was this patch tested?
Existing `HiveResultSuite` SPARK-57257 tests (precision 7/8/9, pre-1970 epochs, nested
arrays/maps/structs, NULLs, session-zone vs zone-independent) now exercise the framework path
and pass unchanged. Also ran `TimestampNanosTypeOpsSuite` and `RowJsonSuite`.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)
Address review: the comment above toSQLValue was free-floating after the
formatExternal(value, nested) override was removed. Drop it and fold the
Row JSON + Hive shared-renderer explanation into the class scaladoc, where
it is anchored to a declaration.
@MaxGekk MaxGekk force-pushed the nanos-update-HiveResult branch from 564364b to d1ff196 Compare June 15, 2026 23:39
@MaxGekk

MaxGekk commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

All tests passed: https://github.com/MaxGekk/spark/runs/81612414862
Merging to master/4.x. Thank you, @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in f49ca14 Jun 16, 2026
MaxGekk added a commit that referenced this pull request Jun 16, 2026
…rough the Types Framework

### What changes were proposed in this pull request?
Unify nanosecond timestamp (`TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`) rendering in `HiveResult` onto the Types Framework, removing the inline duplicate renderer.

- `TimestampNanosTypeApiOps`: remove the `formatExternal(value, nested) = None` override so the Hive path shares each subclass's single-arg `formatExternal` renderer (the same one Row JSON uses). `nested` does not affect timestamp formatting.
- `HiveResult.toHiveStringDefault`: remove the inline `TimestampLTZNanosType` / `TimestampNTZNanosType` cases. The legacy path keeps no nanos handling, so a nanos value that somehow reaches it (only possible with the framework off, which the gating forbids) is unsupported rather than silently rendered.
- `TypeApiOps`: update the two-arg `formatExternal` scaladoc to reflect that Hive now shares the single-arg renderer.

### Why are the changes needed?
The nanosecond timestamp types are a Types Framework feature, implemented solely through it. External-value rendering for the framework is centralized in `TypeApiOps.formatExternal`, which already backs Row JSON (`Row.json` / `Row.prettyJson`).

The nanos ops previously overrode the two-arg `formatExternal(value, nested)` to return `None`, so `HiveResult` rendered nanos through inline pattern-matching in `toHiveStringDefault`. That duplicated the formatter logic and was documented in code as a temporary split "until nanos external rendering is unified across the zone-less (Row JSON) and zone-aware (Hive) paths".

The types are gated by `timestampNanosTypesEnabled = timestampNanosTypes.enabled && types.framework.enabled`, so a nanos column cannot exist while the framework is off; the inline cases are therefore dead code in that mode and redundant when the framework is on.

### Does this PR introduce _any_ user-facing change?
No. Nanos Hive output is identical: zone-aware LTZ, zone-independent NTZ, precision flooring, and trailing-zero trimming are all unchanged. Both the old inline path and the framework path ultimately call the same `TimestampFormatter` methods; the pre-flooring in the inline path is a no-op because values reaching Hive come from `executeCollectPublic()`, whose internal value is already stored floored to the column precision.

### How was this patch tested?
Existing `HiveResultSuite` SPARK-57257 tests cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, and session-zone vs zone-independent rendering, and now exercise the framework path. Also ran `TimestampNanosTypeOpsSuite` and `RowJsonSuite`. All pass.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)

Closes #56513 from MaxGekk/nanos-update-HiveResult.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit f49ca14)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants