Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] support spark.sql.legacy.timeParserPolicy #50

Closed
revans2 opened this issue May 29, 2020 · 5 comments
Closed

[FEA] support spark.sql.legacy.timeParserPolicy #50

revans2 opened this issue May 29, 2020 · 5 comments
Assignees
Labels
feature request New feature or request P0 Must have for release SQL part of the SQL/Dataframe plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented May 29, 2020

Is your feature request related to a problem? Please describe.
When parsing dates and times it would be good if we could also follow the spark.sql.legacy.timeParserPolicy config.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin labels May 29, 2020
@sameerz
Copy link
Collaborator

sameerz commented Sep 22, 2020

Trace through where this config is used in Spark and if the plugin cannot match the same functionality, fall back to the CPU.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Sep 22, 2020
@sameerz sameerz added the P0 Must have for release label Sep 22, 2020
@sameerz sameerz added this to the Oct 26 - Nov 6 milestone Oct 23, 2020
@andygrove andygrove self-assigned this Oct 23, 2020
wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020
- CUDF 0.9.1
- XGBoost4J 1.0.0-Beta2
@andygrove
Copy link
Contributor

andygrove commented Nov 11, 2020

The default value for spark.sql.legacy.timeParserPolicy is EXCEPTION in which case Spark throws an exception if any of the following functions are unable to parse data using the specified pattern, and suggests that the conversion may work with LEGACY. If the config is set to CORRECTED then the conversion will return null instead of throwing an exception.

  • unix_timestamp
  • from_unixtime
  • from_utc_timstamp
  • to_unix_timestamp
  • to_utc_timestamp
  • to_date
  • to_timestamp
  • date_format

I propose that we follow the same behavior but fall back to CPU for LEGACY for these functions until we have a reason to add support for specific legacy formats that are no longer supported in Spark 3.0 and later. If we do end up doing that we can then just fall back to CPU for legacy formats that we do not support.

@andygrove
Copy link
Contributor

Resolved by #1113 for functions, and I filed a follow on #1111 for handling this for CSV reads

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <[email protected]>
@nsinghal20
Copy link

The default value for spark.sql.legacy.timeParserPolicy is EXCEPTION in which case Spark throws an exception if any of the following functions are unable to parse data using the specified pattern, and suggests that the conversion may work with LEGACY. If the config is set to CORRECTED then the conversion will return null instead of throwing an exception.

  • unix_timestamp
  • from_unixtime
  • from_utc_timstamp
  • to_unix_timestamp
  • to_utc_timestamp
  • to_date
  • to_timestamp
  • date_format

I propose that we follow the same behavior but fall back to CPU for LEGACY for these functions until we have a reason to add support for specific legacy formats that are no longer supported in Spark 3.0 and later. If we do end up doing that we can then just fall back to CPU for legacy formats that we do not support.

This answer seems satisfying to my problem but can anyone suggest an example timestamp / data for which it may be throwing error , my date is 2024-12-10 13:22:22 but still it throws exception

@revans2
Copy link
Collaborator Author

revans2 commented Dec 20, 2024

@nsinghal20 can you give a concrete example of where you are getting the error/exception? If you want to file it as a separate issue, or even a question that might help with visibility. I did a quick test and it is working as expected for me, but my test is far from exhaustive.

spark.conf.set("spark.sql.legacy.timeParserPolicy","CORRECTED")
Seq("2024-12-10 13:22:22").toDF("sts").repartition(1).selectExpr("*","to_timestamp(sts, 'yyyy-MM-dd HH:mm:ss') as ts").show()
// No Fallbacks for GetTimestamp...
+-------------------+-------------------+
|                sts|                 ts|
+-------------------+-------------------+
|2024-12-10 13:22:22|2024-12-10 13:22:22|
+-------------------+-------------------+

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")
Seq("2024-12-10 13:22:22").toDF("sts").repartition(1).selectExpr("*","to_timestamp(sts, 'yyyy-MM-dd HH:mm:ss') as ts").show()
//!Expression <GetTimestamp> gettimestamp(sts#184, yyyy-MM-dd HH:mm:ss, TimestampType, Some(UTC), false) cannot run on GPU because LEGACY format 'yyyy-MM-dd HH:mm:ss' on the GPU is not guaranteed to produce the same results as Spark on CPU. Set spark.rapids.sql.incompatibleDateFormats.enabled=true to force onto GPU.
+-------------------+-------------------+
|                sts|                 ts|
+-------------------+-------------------+
|2024-12-10 13:22:22|2024-12-10 13:22:22|
+-------------------+-------------------+

spark.conf.set("spark.rapids.sql.incompatibleDateFormats.enabled",true)
Seq("2024-12-10 13:22:22").toDF("sts").repartition(1).selectExpr("*","to_timestamp(sts, 'yyyy-MM-dd HH:mm:ss') as ts").show()
// No Fallbacks for GetTimestamp...
+-------------------+-------------------+
|                sts|                 ts|
+-------------------+-------------------+
|2024-12-10 13:22:22|2024-12-10 13:22:22|
+-------------------+-------------------+

Now granted I am running on Spark 3.4.2 and a SNAPSHOT version of the plugin. There are subtle differences for different versions of Spark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P0 Must have for release SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

No branches or pull requests

4 participants