[SPARK-57469][SQL] Support date field functions on nanosecond-precision timestamps in ANSI mode#56518
Closed
MaxGekk wants to merge 1 commit into
Closed
[SPARK-57469][SQL] Support date field functions on nanosecond-precision timestamps in ANSI mode#56518MaxGekk wants to merge 1 commit into
MaxGekk wants to merge 1 commit into
Conversation
…on timestamps in ANSI mode ### What changes were proposed in this pull request? Make the date field extraction functions (`year`, `quarter`, `month`, `day`/`dayofmonth`, `dayofyear`, `dayofweek`, `weekday`, `weekofyear`, `yearofweek`) and the transitive `EXTRACT` / `date_part` date components work on the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` (`p` in `[7, 9]`) in ANSI mode. - Add `AnyTimestampNanoType` (`AbstractDataType`) and `AnyTimestampNanoTypeExpression` (expression extractor) matching `TimestampLTZNanosType` / `TimestampNTZNanosType`. - Extend `AnsiGetDateFieldOperationsTypeCoercion` to also match nanos-timestamp children of `GetDateField` and cast them to `DATE`, identical to the existing micro path. This keeps `Cast.canANSIStoreAssign` / `Cast.canUpCast` strict for `DATE` <-> nanos. - Add nanosecond examples to the date field functions' `@ExpressionDescription`. ### Why are the changes needed? In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with `DATATYPE_MISMATCH` on nanosecond timestamps, because the generic implicit-cast rule defers to `Cast.canANSIStoreAssign` (false for nanos -> DATE) and the dedicated `AnsiGetDateFieldOperationsTypeCoercion` rule matched only the microsecond timestamp types. ### Does this PR introduce any user-facing change? Yes. With `spark.sql.timestampNanosTypes.enabled=true`, date field functions and `EXTRACT` / `date_part` date components now work on `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` in ANSI mode (they already worked in non-ANSI mode). ### How was this patch tested? - New unit tests in `TimestampNanosFunctionsSuiteBase` (ANSI on/off), covering leap years, ISO-week and quarter boundaries, pre-epoch dates, varied precisions, and LTZ time-zone date shifts. - New golden-file queries in `timestamp-ltz-nanos.sql` / `timestamp-ntz-nanos.sql`. - `ExpressionInfoSuite` validates the new examples. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor
HyukjinKwon
approved these changes
Jun 15, 2026
Member
Author
|
Merging to master/4.x. Thank you, @HyukjinKwon for review. |
MaxGekk
added a commit
that referenced
this pull request
Jun 15, 2026
…on timestamps in ANSI mode
### What changes were proposed in this pull request?
This PR makes the date field extraction functions (`year`, `quarter`, `month`, `day`/`dayofmonth`, `dayofyear`, `dayofweek`, `weekday`, `weekofyear`, `monthname`, `dayname`) and the transitive `EXTRACT` / `date_part` date components (including `yearofweek`, which has no standalone function) work on the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` (`p` in `[7, 9]`) in ANSI mode.
- Add `AnyTimestampNanoType` (`AbstractDataType`) and `AnyTimestampNanoTypeExpression` (expression extractor) matching `TimestampLTZNanosType` / `TimestampNTZNanosType`, mirroring the existing micro abstractions `AnyTimestampType` / `AnyTimestampTypeExpression`.
- Extend `AnsiGetDateFieldOperationsTypeCoercion` to also match nanos-timestamp children of `GetDateField` and cast them to `DATE`, exactly as it already does for micro timestamps. This deliberately does not widen `AnyTimestampType` (also used as an "accept-as-is, no cast" `inputTypes` gate on micro-only expressions), and keeps `Cast.canANSIStoreAssign` / `Cast.canUpCast` strict for `DATE` <-> nanos (the cast is inserted explicitly by the field-extraction rule).
- Add nanosecond examples to the date field functions' `ExpressionDescription`.
No `EXTRACT`-specific change is needed: `EXTRACT(field FROM source)` is a `RuntimeReplaceable` that rewrites via `DatePart.parseExtractField` to the same `GetDateField` expressions, so once the `GetDateField` coercion is fixed, `extract(year from nanos_ts)` and `date_part('year', nanos_ts)` work transitively in both ANSI and non-ANSI modes. Time-of-day fields (`HOUR`/`MINUTE`/`SECOND`) were already handled by SPARK-57340.
This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision) and builds on [SPARK-57323](https://issues.apache.org/jira/browse/SPARK-57323) (DATE <-> nanos casts).
### Why are the changes needed?
In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with `DATATYPE_MISMATCH` on nanosecond timestamps. For example:
```sql
SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
```
fails, because the generic ANSI implicit-cast rule defers to `Cast.canANSIStoreAssign` (which returns false for nanos -> DATE by design) and the dedicated `AnsiGetDateFieldOperationsTypeCoercion` rule matched only the microsecond timestamp types. In non-ANSI mode the functions already work via the blanket `(_: DatetimeType, _: DatetimeType)` implicit-cast arm.
### Does this PR introduce _any_ user-facing change?
Yes. With `spark.sql.timestampNanosTypes.enabled=true`, date field functions and the `EXTRACT` / `date_part` date components now work on `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` in ANSI mode. Previously they failed analysis with `DATATYPE_MISMATCH`; they already worked in non-ANSI mode. The nanosecond timestamp types are a preview feature (disabled by default), so there is no change for released behavior.
### How was this patch tested?
- New unit tests in `TimestampNanosFunctionsSuiteBase` (run with ANSI mode on and off), covering the function form, the `EXTRACT` / `date_part` form, and the `functions.*` Column API, over a spread of values: leap day, ISO-week and quarter boundaries, pre-epoch dates, varied precisions (7/8/9) and fractions, LTZ time-zone date shifts, and NULLs.
- New golden-file queries appended to `timestamp-ltz-nanos.sql` / `timestamp-ntz-nanos.sql` with regenerated `.sql.out` files.
- `ExpressionInfoSuite` validates the new `ExpressionDescription` examples.
- `./dev/scalastyle` and scalafmt pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
Closes #56518 from MaxGekk/SPARK-57469.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit a25cd89)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR makes the date field extraction functions (
year,quarter,month,day/dayofmonth,dayofyear,dayofweek,weekday,weekofyear,monthname,dayname) and the transitiveEXTRACT/date_partdate components (includingyearofweek, which has no standalone function) work on the nanosecond-precision timestamp typesTIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)(pin[7, 9]) in ANSI mode.AnyTimestampNanoType(AbstractDataType) andAnyTimestampNanoTypeExpression(expression extractor) matchingTimestampLTZNanosType/TimestampNTZNanosType, mirroring the existing micro abstractionsAnyTimestampType/AnyTimestampTypeExpression.AnsiGetDateFieldOperationsTypeCoercionto also match nanos-timestamp children ofGetDateFieldand cast them toDATE, exactly as it already does for micro timestamps. This deliberately does not widenAnyTimestampType(also used as an "accept-as-is, no cast"inputTypesgate on micro-only expressions), and keepsCast.canANSIStoreAssign/Cast.canUpCaststrict forDATE<-> nanos (the cast is inserted explicitly by the field-extraction rule).@ExpressionDescription.No
EXTRACT-specific change is needed:EXTRACT(field FROM source)is aRuntimeReplaceablethat rewrites viaDatePart.parseExtractFieldto the sameGetDateFieldexpressions, so once theGetDateFieldcoercion is fixed,extract(year from nanos_ts)anddate_part('year', nanos_ts)work transitively in both ANSI and non-ANSI modes. Time-of-day fields (HOUR/MINUTE/SECOND) were already handled by SPARK-57340.This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision) and builds on SPARK-57323 (DATE <-> nanos casts).
Why are the changes needed?
In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with
DATATYPE_MISMATCHon nanosecond timestamps. For example:fails, because the generic ANSI implicit-cast rule defers to
Cast.canANSIStoreAssign(which returns false for nanos -> DATE by design) and the dedicatedAnsiGetDateFieldOperationsTypeCoercionrule matched only the microsecond timestamp types. In non-ANSI mode the functions already work via the blanket(_: DatetimeType, _: DatetimeType)implicit-cast arm.Does this PR introduce any user-facing change?
Yes. With
spark.sql.timestampNanosTypes.enabled=true, date field functions and theEXTRACT/date_partdate components now work onTIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)in ANSI mode. Previously they failed analysis withDATATYPE_MISMATCH; they already worked in non-ANSI mode. The nanosecond timestamp types are a preview feature (disabled by default), so there is no change for released behavior.How was this patch tested?
TimestampNanosFunctionsSuiteBase(run with ANSI mode on and off), covering the function form, theEXTRACT/date_partform, and thefunctions.*Column API, over a spread of values: leap day, ISO-week and quarter boundaries, pre-epoch dates, varied precisions (7/8/9) and fractions, LTZ time-zone date shifts, and NULLs.timestamp-ltz-nanos.sql/timestamp-ntz-nanos.sqlwith regenerated.sql.outfiles.ExpressionInfoSuitevalidates the new@ExpressionDescriptionexamples../dev/scalastyleand scalafmt pass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor