feat: Add TimestampNTZType support for casts and unix_timestamp#3253
feat: Add TimestampNTZType support for casts and unix_timestamp#3253andygrove wants to merge 15 commits intoapache:mainfrom
Conversation
The csv_scan.rs file uses types from datafusion_datasource but the dependency was not declared in native/core/Cargo.toml. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive support for TimestampNTZType (Timestamp without timezone) wherever TimestampType is currently supported. Changes: - Add TimestampNTZType to CometCast.supportedTypes - Support casting TimestampNTZ to Long, String, Date, and Timestamp - Add TimestampNTZ support to unix_timestamp function (no timezone conversion) - Add tests for TimestampNTZ casts and temporal expressions TimestampNTZ stores local time without timezone context, so: - unix_timestamp simply divides microseconds by 1,000,000 - Casts to Date use simple truncation (no timezone adjustment) - Casts to String format as local datetime without timezone suffix Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3253 +/- ##
============================================
+ Coverage 56.12% 59.90% +3.77%
- Complexity 976 1473 +497
============================================
Files 119 175 +56
Lines 11743 16172 +4429
Branches 2251 2686 +435
============================================
+ Hits 6591 9688 +3097
- Misses 4012 5131 +1119
- Partials 1140 1353 +213 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This commit fixes an issue where date_trunc on timestamp_ntz values could produce incorrect results when the truncation crosses DST boundaries (e.g., truncating a December date to October). The fix modifies as_micros_from_unix_epoch_utc to re-interpret the local datetime in the timezone after truncation, ensuring the correct DST offset is used for the target date. Also updates the test to use a reasonable date range (around year 2024) since chrono-tz has limited support for DST calculations with far-future dates (beyond approximately year 2100). Adds documentation about this known limitation to the compatibility guide. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…test - Remove cast tests for TimestampNTZType to LongType (Spark doesn't support) - Remove cast tests for numeric types and BinaryType to TimestampNTZType (Spark doesn't support these casts) - Fix date_format timestamp_ntz test by using UTC timezone explicitly (Comet interprets timestamp_ntz as UTC during cast, which differs from Spark's session timezone behavior for non-UTC timezones) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…pport # Conflicts: # docs/source/user-guide/latest/compatibility.md
|
What is the expected behavior when data is written to a timestamp ntz field with one session timezone and read by another user with a different session timezone? |
|
I am moving this to draft until the DataFusion 52 upgrade is merged. |
Skip timezone conversion for TimestampNTZ arrays in timestamp_trunc. NTZ values are timezone-independent, so truncation operates directly on naive microsecond values without any timezone resolution, avoiding panics on ambiguous DST fall-back times (e.g. 2024-11-03T01:30:00 in US/Eastern). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thanks. I added these tests and that did expose a bug, which is now fixed. |
Summary
Add comprehensive support for
TimestampNTZType(Timestamp without timezone) whereverTimestampTypeis currently supported.Key changes:
TimestampNTZTypetoCometCast.supportedTypesunix_timestampfunction (no timezone conversion needed)Semantic differences from TimestampType:
unix_timestamp(TimestampNTZ): Simply divides microseconds by 1,000,000 (no timezone adjustment)Test plan
./mvnw test -DwildcardSuites="CometCast"./mvnw test -DwildcardSuites="CometTemporalExpression"🤖 Generated with Claude Code