Skip to content

[SPARK-57469][SQL] Support date field functions on nanosecond-precision timestamps in ANSI mode#56518

Closed
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:SPARK-57469
Closed

[SPARK-57469][SQL] Support date field functions on nanosecond-precision timestamps in ANSI mode#56518
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:SPARK-57469

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 15, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR makes the date field extraction functions (year, quarter, month, day/dayofmonth, dayofyear, dayofweek, weekday, weekofyear, monthname, dayname) and the transitive EXTRACT / date_part date components (including yearofweek, which has no standalone function) work on the nanosecond-precision timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p in [7, 9]) in ANSI mode.

  • Add AnyTimestampNanoType (AbstractDataType) and AnyTimestampNanoTypeExpression (expression extractor) matching TimestampLTZNanosType / TimestampNTZNanosType, mirroring the existing micro abstractions AnyTimestampType / AnyTimestampTypeExpression.
  • Extend AnsiGetDateFieldOperationsTypeCoercion to also match nanos-timestamp children of GetDateField and cast them to DATE, exactly as it already does for micro timestamps. This deliberately does not widen AnyTimestampType (also used as an "accept-as-is, no cast" inputTypes gate on micro-only expressions), and keeps Cast.canANSIStoreAssign / Cast.canUpCast strict for DATE <-> nanos (the cast is inserted explicitly by the field-extraction rule).
  • Add nanosecond examples to the date field functions' @ExpressionDescription.

No EXTRACT-specific change is needed: EXTRACT(field FROM source) is a RuntimeReplaceable that rewrites via DatePart.parseExtractField to the same GetDateField expressions, so once the GetDateField coercion is fixed, extract(year from nanos_ts) and date_part('year', nanos_ts) work transitively in both ANSI and non-ANSI modes. Time-of-day fields (HOUR/MINUTE/SECOND) were already handled by SPARK-57340.

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision) and builds on SPARK-57323 (DATE <-> nanos casts).

Why are the changes needed?

In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with DATATYPE_MISMATCH on nanosecond timestamps. For example:

SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));

fails, because the generic ANSI implicit-cast rule defers to Cast.canANSIStoreAssign (which returns false for nanos -> DATE by design) and the dedicated AnsiGetDateFieldOperationsTypeCoercion rule matched only the microsecond timestamp types. In non-ANSI mode the functions already work via the blanket (_: DatetimeType, _: DatetimeType) implicit-cast arm.

Does this PR introduce any user-facing change?

Yes. With spark.sql.timestampNanosTypes.enabled=true, date field functions and the EXTRACT / date_part date components now work on TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) in ANSI mode. Previously they failed analysis with DATATYPE_MISMATCH; they already worked in non-ANSI mode. The nanosecond timestamp types are a preview feature (disabled by default), so there is no change for released behavior.

How was this patch tested?

  • New unit tests in TimestampNanosFunctionsSuiteBase (run with ANSI mode on and off), covering the function form, the EXTRACT / date_part form, and the functions.* Column API, over a spread of values: leap day, ISO-week and quarter boundaries, pre-epoch dates, varied precisions (7/8/9) and fractions, LTZ time-zone date shifts, and NULLs.
  • New golden-file queries appended to timestamp-ltz-nanos.sql / timestamp-ntz-nanos.sql with regenerated .sql.out files.
  • ExpressionInfoSuite validates the new @ExpressionDescription examples.
  • ./dev/scalastyle and scalafmt pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

…on timestamps in ANSI mode

### What changes were proposed in this pull request?
Make the date field extraction functions (`year`, `quarter`, `month`,
`day`/`dayofmonth`, `dayofyear`, `dayofweek`, `weekday`, `weekofyear`,
`yearofweek`) and the transitive `EXTRACT` / `date_part` date components work on
the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`
(`p` in `[7, 9]`) in ANSI mode.

- Add `AnyTimestampNanoType` (`AbstractDataType`) and
  `AnyTimestampNanoTypeExpression` (expression extractor) matching
  `TimestampLTZNanosType` / `TimestampNTZNanosType`.
- Extend `AnsiGetDateFieldOperationsTypeCoercion` to also match nanos-timestamp
  children of `GetDateField` and cast them to `DATE`, identical to the existing
  micro path. This keeps `Cast.canANSIStoreAssign` / `Cast.canUpCast` strict for
  `DATE` <-> nanos.
- Add nanosecond examples to the date field functions' `@ExpressionDescription`.

### Why are the changes needed?
In ANSI mode (the default since Spark 4.0) the date field functions fail
analysis with `DATATYPE_MISMATCH` on nanosecond timestamps, because the generic
implicit-cast rule defers to `Cast.canANSIStoreAssign` (false for nanos -> DATE)
and the dedicated `AnsiGetDateFieldOperationsTypeCoercion` rule matched only the
microsecond timestamp types.

### Does this PR introduce any user-facing change?
Yes. With `spark.sql.timestampNanosTypes.enabled=true`, date field functions and
`EXTRACT` / `date_part` date components now work on `TIMESTAMP_NTZ(p)` /
`TIMESTAMP_LTZ(p)` in ANSI mode (they already worked in non-ANSI mode).

### How was this patch tested?
- New unit tests in `TimestampNanosFunctionsSuiteBase` (ANSI on/off), covering
  leap years, ISO-week and quarter boundaries, pre-epoch dates, varied
  precisions, and LTZ time-zone date shifts.
- New golden-file queries in `timestamp-ltz-nanos.sql` / `timestamp-ntz-nanos.sql`.
- `ExpressionInfoSuite` validates the new examples.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
@MaxGekk

MaxGekk commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Merging to master/4.x. Thank you, @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in a25cd89 Jun 15, 2026
MaxGekk added a commit that referenced this pull request Jun 15, 2026
…on timestamps in ANSI mode

### What changes were proposed in this pull request?
This PR makes the date field extraction functions (`year`, `quarter`, `month`, `day`/`dayofmonth`, `dayofyear`, `dayofweek`, `weekday`, `weekofyear`, `monthname`, `dayname`) and the transitive `EXTRACT` / `date_part` date components (including `yearofweek`, which has no standalone function) work on the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` (`p` in `[7, 9]`) in ANSI mode.

- Add `AnyTimestampNanoType` (`AbstractDataType`) and `AnyTimestampNanoTypeExpression` (expression extractor) matching `TimestampLTZNanosType` / `TimestampNTZNanosType`, mirroring the existing micro abstractions `AnyTimestampType` / `AnyTimestampTypeExpression`.
- Extend `AnsiGetDateFieldOperationsTypeCoercion` to also match nanos-timestamp children of `GetDateField` and cast them to `DATE`, exactly as it already does for micro timestamps. This deliberately does not widen `AnyTimestampType` (also used as an "accept-as-is, no cast" `inputTypes` gate on micro-only expressions), and keeps `Cast.canANSIStoreAssign` / `Cast.canUpCast` strict for `DATE` <-> nanos (the cast is inserted explicitly by the field-extraction rule).
- Add nanosecond examples to the date field functions' `ExpressionDescription`.

No `EXTRACT`-specific change is needed: `EXTRACT(field FROM source)` is a `RuntimeReplaceable` that rewrites via `DatePart.parseExtractField` to the same `GetDateField` expressions, so once the `GetDateField` coercion is fixed, `extract(year from nanos_ts)` and `date_part('year', nanos_ts)` work transitively in both ANSI and non-ANSI modes. Time-of-day fields (`HOUR`/`MINUTE`/`SECOND`) were already handled by SPARK-57340.

This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision) and builds on [SPARK-57323](https://issues.apache.org/jira/browse/SPARK-57323) (DATE <-> nanos casts).

### Why are the changes needed?
In ANSI mode (the default since Spark 4.0) the date field functions fail analysis with `DATATYPE_MISMATCH` on nanosecond timestamps. For example:

```sql
SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
```

fails, because the generic ANSI implicit-cast rule defers to `Cast.canANSIStoreAssign` (which returns false for nanos -> DATE by design) and the dedicated `AnsiGetDateFieldOperationsTypeCoercion` rule matched only the microsecond timestamp types. In non-ANSI mode the functions already work via the blanket `(_: DatetimeType, _: DatetimeType)` implicit-cast arm.

### Does this PR introduce _any_ user-facing change?
Yes. With `spark.sql.timestampNanosTypes.enabled=true`, date field functions and the `EXTRACT` / `date_part` date components now work on `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` in ANSI mode. Previously they failed analysis with `DATATYPE_MISMATCH`; they already worked in non-ANSI mode. The nanosecond timestamp types are a preview feature (disabled by default), so there is no change for released behavior.

### How was this patch tested?
- New unit tests in `TimestampNanosFunctionsSuiteBase` (run with ANSI mode on and off), covering the function form, the `EXTRACT` / `date_part` form, and the `functions.*` Column API, over a spread of values: leap day, ISO-week and quarter boundaries, pre-epoch dates, varied precisions (7/8/9) and fractions, LTZ time-zone date shifts, and NULLs.
- New golden-file queries appended to `timestamp-ltz-nanos.sql` / `timestamp-ntz-nanos.sql` with regenerated `.sql.out` files.
- `ExpressionInfoSuite` validates the new `ExpressionDescription` examples.
- `./dev/scalastyle` and scalafmt pass.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor

Closes #56518 from MaxGekk/SPARK-57469.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit a25cd89)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants