[SPARK-57469][SQL] Support date field functions on nanosecond-precision timestamps in ANSI mode by MaxGekk · Pull Request #56518 · apache/spark

MaxGekk · 2026-06-15T14:04:33Z

What changes were proposed in this pull request?

This PR makes the date field extraction functions (year, quarter, month, day/dayofmonth, dayofyear, dayofweek, weekday, weekofyear, monthname, dayname) and the transitive EXTRACT / date_part date components (including yearofweek, which has no standalone function) work on the nanosecond-precision timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p in [7, 9]) in ANSI mode.

Add AnyTimestampNanoType (AbstractDataType) and AnyTimestampNanoTypeExpression (expression extractor) matching TimestampLTZNanosType / TimestampNTZNanosType, mirroring the existing micro abstractions AnyTimestampType / AnyTimestampTypeExpression.
Extend AnsiGetDateFieldOperationsTypeCoercion to also match nanos-timestamp children of GetDateField and cast them to DATE, exactly as it already does for micro timestamps. This deliberately does not widen AnyTimestampType (also used as an "accept-as-is, no cast" inputTypes gate on micro-only expressions), and keeps Cast.canANSIStoreAssign / Cast.canUpCast strict for DATE <-> nanos (the cast is inserted explicitly by the field-extraction rule).
Add nanosecond examples to the date field functions' @ExpressionDescription.

No EXTRACT-specific change is needed: EXTRACT(field FROM source) is a RuntimeReplaceable that rewrites via DatePart.parseExtractField to the same GetDateField expressions, so once the GetDateField coercion is fixed, extract(year from nanos_ts) and date_part('year', nanos_ts) work transitively in both ANSI and non-ANSI modes. Time-of-day fields (HOUR/MINUTE/SECOND) were already handled by SPARK-57340.

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision) and builds on SPARK-57323 (DATE <-> nanos casts).

Why are the changes needed?

In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with DATATYPE_MISMATCH on nanosecond timestamps. For example:

SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));

fails, because the generic ANSI implicit-cast rule defers to Cast.canANSIStoreAssign (which returns false for nanos -> DATE by design) and the dedicated AnsiGetDateFieldOperationsTypeCoercion rule matched only the microsecond timestamp types. In non-ANSI mode the functions already work via the blanket (_: DatetimeType, _: DatetimeType) implicit-cast arm.

Does this PR introduce any user-facing change?

Yes. With spark.sql.timestampNanosTypes.enabled=true, date field functions and the EXTRACT / date_part date components now work on TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) in ANSI mode. Previously they failed analysis with DATATYPE_MISMATCH; they already worked in non-ANSI mode. The nanosecond timestamp types are a preview feature (disabled by default), so there is no change for released behavior.

How was this patch tested?

New unit tests in TimestampNanosFunctionsSuiteBase (run with ANSI mode on and off), covering the function form, the EXTRACT / date_part form, and the functions.* Column API, over a spread of values: leap day, ISO-week and quarter boundaries, pre-epoch dates, varied precisions (7/8/9) and fractions, LTZ time-zone date shifts, and NULLs.
New golden-file queries appended to timestamp-ltz-nanos.sql / timestamp-ntz-nanos.sql with regenerated .sql.out files.
ExpressionInfoSuite validates the new @ExpressionDescription examples.
./dev/scalastyle and scalafmt pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

…on timestamps in ANSI mode ### What changes were proposed in this pull request? Make the date field extraction functions (`year`, `quarter`, `month`, `day`/`dayofmonth`, `dayofyear`, `dayofweek`, `weekday`, `weekofyear`, `yearofweek`) and the transitive `EXTRACT` / `date_part` date components work on the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` (`p` in `[7, 9]`) in ANSI mode. - Add `AnyTimestampNanoType` (`AbstractDataType`) and `AnyTimestampNanoTypeExpression` (expression extractor) matching `TimestampLTZNanosType` / `TimestampNTZNanosType`. - Extend `AnsiGetDateFieldOperationsTypeCoercion` to also match nanos-timestamp children of `GetDateField` and cast them to `DATE`, identical to the existing micro path. This keeps `Cast.canANSIStoreAssign` / `Cast.canUpCast` strict for `DATE` <-> nanos. - Add nanosecond examples to the date field functions' `@ExpressionDescription`. ### Why are the changes needed? In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with `DATATYPE_MISMATCH` on nanosecond timestamps, because the generic implicit-cast rule defers to `Cast.canANSIStoreAssign` (false for nanos -> DATE) and the dedicated `AnsiGetDateFieldOperationsTypeCoercion` rule matched only the microsecond timestamp types. ### Does this PR introduce any user-facing change? Yes. With `spark.sql.timestampNanosTypes.enabled=true`, date field functions and `EXTRACT` / `date_part` date components now work on `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` in ANSI mode (they already worked in non-ANSI mode). ### How was this patch tested? - New unit tests in `TimestampNanosFunctionsSuiteBase` (ANSI on/off), covering leap years, ISO-week and quarter boundaries, pre-epoch dates, varied precisions, and LTZ time-zone date shifts. - New golden-file queries in `timestamp-ltz-nanos.sql` / `timestamp-ntz-nanos.sql`. - `ExpressionInfoSuite` validates the new examples. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor

MaxGekk · 2026-06-15T23:17:27Z

Merging to master/4.x. Thank you, @HyukjinKwon for review.

…on timestamps in ANSI mode ### What changes were proposed in this pull request? This PR makes the date field extraction functions (`year`, `quarter`, `month`, `day`/`dayofmonth`, `dayofyear`, `dayofweek`, `weekday`, `weekofyear`, `monthname`, `dayname`) and the transitive `EXTRACT` / `date_part` date components (including `yearofweek`, which has no standalone function) work on the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` (`p` in `[7, 9]`) in ANSI mode. - Add `AnyTimestampNanoType` (`AbstractDataType`) and `AnyTimestampNanoTypeExpression` (expression extractor) matching `TimestampLTZNanosType` / `TimestampNTZNanosType`, mirroring the existing micro abstractions `AnyTimestampType` / `AnyTimestampTypeExpression`. - Extend `AnsiGetDateFieldOperationsTypeCoercion` to also match nanos-timestamp children of `GetDateField` and cast them to `DATE`, exactly as it already does for micro timestamps. This deliberately does not widen `AnyTimestampType` (also used as an "accept-as-is, no cast" `inputTypes` gate on micro-only expressions), and keeps `Cast.canANSIStoreAssign` / `Cast.canUpCast` strict for `DATE` <-> nanos (the cast is inserted explicitly by the field-extraction rule). - Add nanosecond examples to the date field functions' `ExpressionDescription`. No `EXTRACT`-specific change is needed: `EXTRACT(field FROM source)` is a `RuntimeReplaceable` that rewrites via `DatePart.parseExtractField` to the same `GetDateField` expressions, so once the `GetDateField` coercion is fixed, `extract(year from nanos_ts)` and `date_part('year', nanos_ts)` work transitively in both ANSI and non-ANSI modes. Time-of-day fields (`HOUR`/`MINUTE`/`SECOND`) were already handled by SPARK-57340. This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision) and builds on [SPARK-57323](https://issues.apache.org/jira/browse/SPARK-57323) (DATE <-> nanos casts). ### Why are the changes needed? In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with `DATATYPE_MISMATCH` on nanosecond timestamps. For example: ```sql SET spark.sql.timestampNanosTypes.enabled=true; SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9)); ``` fails, because the generic ANSI implicit-cast rule defers to `Cast.canANSIStoreAssign` (which returns false for nanos -> DATE by design) and the dedicated `AnsiGetDateFieldOperationsTypeCoercion` rule matched only the microsecond timestamp types. In non-ANSI mode the functions already work via the blanket `(_: DatetimeType, _: DatetimeType)` implicit-cast arm. ### Does this PR introduce _any_ user-facing change? Yes. With `spark.sql.timestampNanosTypes.enabled=true`, date field functions and the `EXTRACT` / `date_part` date components now work on `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` in ANSI mode. Previously they failed analysis with `DATATYPE_MISMATCH`; they already worked in non-ANSI mode. The nanosecond timestamp types are a preview feature (disabled by default), so there is no change for released behavior. ### How was this patch tested? - New unit tests in `TimestampNanosFunctionsSuiteBase` (run with ANSI mode on and off), covering the function form, the `EXTRACT` / `date_part` form, and the `functions.*` Column API, over a spread of values: leap day, ISO-week and quarter boundaries, pre-epoch dates, varied precisions (7/8/9) and fractions, LTZ time-zone date shifts, and NULLs. - New golden-file queries appended to `timestamp-ltz-nanos.sql` / `timestamp-ntz-nanos.sql` with regenerated `.sql.out` files. - `ExpressionInfoSuite` validates the new `ExpressionDescription` examples. - `./dev/scalastyle` and scalafmt pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56518 from MaxGekk/SPARK-57469. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit a25cd89) Signed-off-by: Max Gekk <max.gekk@gmail.com>

HyukjinKwon approved these changes Jun 15, 2026

View reviewed changes

MaxGekk closed this in a25cd89 Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57469][SQL] Support date field functions on nanosecond-precision timestamps in ANSI mode#56518

[SPARK-57469][SQL] Support date field functions on nanosecond-precision timestamps in ANSI mode#56518
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:SPARK-57469

MaxGekk commented Jun 15, 2026 •

edited

Loading

Uh oh!

MaxGekk commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxGekk commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxGekk commented Jun 15, 2026 •

edited

Loading