Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Oct 22, 2025

Which issue does this PR close?

Closes #2621

There is a follow-up issue #2649 for another bug discovered while working on this PR.

Rationale for this change

Both the trunc and date_trunc functions accept a format string parameter, such as year or month.

My understanding is that most users would specify a literal string for this parameter. However, it is possible to specify a column, or any other valid Spark expression. Unfortunately, Comet is not compatible with Spark for this use case if the expression evaluates to a format string that is not supported. The query will fail with an error such as:

Unsupported format: "foo" for function 'date_trunc'

The correct behavior would be to return NULL instead.

scala> spark.sql("select now(), a, date_trunc(a, now()) from t1").show
+--------------------+-----+--------------------+
|               now()|    a|date_trunc(a, now())|
+--------------------+-----+--------------------+
|2025-10-23 15:57:...| year| 2025-01-01 00:00:00|
|2025-10-23 15:57:...|month| 2025-10-01 00:00:00|
|2025-10-23 15:57:...|  foo|                NULL|
+--------------------+-----+--------------------+

The same is true for literal format strings:

scala> spark.sql("select now(), date_trunc('foo', now())").show
+--------------------+----------------------+
|               now()|date_trunc(foo, now())|
+--------------------+----------------------+
|2025-10-23 15:59:...|                  NULL|
+--------------------+----------------------+

What changes are included in this PR?

  • Fall back to Spark for invalid literal format string
  • Fall back to Spark for non-literal format string
  • Implement an improved test
  • Remove tests specific to the format being an array, which is no longer supported

How are these changes tested?

@andygrove andygrove changed the title fix: Fix various bugs in trunc / date_trunc functions [WIP] fix: Fix various bugs in trunc / date_trunc functions Oct 23, 2025
@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 91.17647% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.29%. Comparing base (f09f8af) to head (8d15e51).
⚠️ Report is 644 commits behind head on main.

Files with missing lines Patch % Lines
...c/main/scala/org/apache/comet/serde/datetime.scala 93.93% 0 Missing and 2 partials ⚠️
...on/src/main/scala/org/apache/comet/CometConf.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2634      +/-   ##
============================================
+ Coverage     56.12%   59.29%   +3.16%     
- Complexity      976     1449     +473     
============================================
  Files           119      147      +28     
  Lines         11743    13789    +2046     
  Branches       2251     2369     +118     
============================================
+ Hits           6591     8176    +1585     
- Misses         4012     4388     +376     
- Partials       1140     1225      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title fix: Fix various bugs in trunc / date_trunc functions fix: Fall back to Spark for trunc / date_trunc functions when format string is not a literal value Oct 23, 2025
@andygrove andygrove marked this pull request as ready for review October 25, 2025 15:40
@andygrove andygrove changed the title fix: Fall back to Spark for trunc / date_trunc functions when format string is not a literal value fix: Fall back to Spark for trunc / date_trunc functions when format string is unsupported, or is not a literal value Oct 27, 2025
}
}

test("trunc with format array") {
Copy link
Member Author

@andygrove andygrove Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer support format being an array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is better to keep this functionality rather than lose it for a case that no user has complained about yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I made it Incompatible rather than Unsupported and re-instated the tests

"minute",
"hour",
"week",
"quarter")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

week and quarter are listed twice

Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

object CometTruncDate extends CometExpressionSerde[TruncDate] {

val supportedFormats: Seq[String] =
Seq("year", "yyyy", "yy", "quarter", "mon", "month", "mm", "week")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of possible formats are:

fmt - the format representing the unit to be truncated to
"YEAR", "YYYY", "YY" - truncate to the first date of the year that the ts falls in, the time part will be zero out
"QUARTER" - truncate to the first date of the quarter that the ts falls in, the time part will be zero out
"MONTH", "MM", "MON" - truncate to the first date of the month that the ts falls in, the time part will be zero out
"WEEK" - truncate to the Monday of the week that the ts falls in, the time part will be zero out
"DAY", "DD" - zero out the time part
"HOUR" - zero out the minute and second with fraction part
"MINUTE"- zero out the second with fraction part
"SECOND" - zero out the second fraction part
"MILLISECOND" - zero out the microseconds
"MICROSECOND" - everything remains

Am I missing a reason why Comet doesn't support smaller than week formats?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very confusing in Spark, but date_trunc is actually TruncTimestamp and not TruncDate.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove lgtm
Btw when reviewing the code I found custom DateTrunc impl in Comet which probably not in use anymore spark-expr/src/datetime_funcs/date_trunc.rs

If it is I'll create a clean up PR

@andygrove
Copy link
Member Author

Thanks @andygrove lgtm Btw when reviewing the code I found custom DateTrunc impl in Comet which probably not in use anymore spark-expr/src/datetime_funcs/date_trunc.rs

If it is I'll create a clean up PR

I assumed that those were the implementations that we were using, but I could be mistaken

@andygrove
Copy link
Member Author

Thanks for the reviews @hsiang-c and @comphead

@andygrove andygrove merged commit 977a189 into apache:main Oct 30, 2025
103 checks passed
@andygrove andygrove deleted the date-trunc branch October 30, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InternalError: Unsupported format: "" for function 'date_trunc'.

6 participants