Skip to content

Conversation

coderfender
Copy link
Contributor

@coderfender coderfender commented Aug 13, 2025

Which issue does this PR close?

Support ANSI mode for arithmetic operations in Spark #2137

Closes #2137.

Rationale for this change

Introduce ANSI support for arithmetic operations (for integer/ float types).
Now that try eval mode is supported with comet,this PR essentially tracks changes to take it a step further and support ANSI mode.

In order to achieve that, following changes are made :

  1. QueryPlanSerde is now updated to not fallback to Spark when EVAL mode is ANSI (for arithmetic operations only)
  2. planner.rs is now going to respect EVAL mode when creating a binary arithmetic function
  3. checked_arithmetic.rs now raises a spark equivalent exception on overflow / division by 0 if ANSI mode is enabled

What changes are included in this PR?

How are these changes tested?

  1. Unit tests in CometExpressionSuite.scala
  2. Spark tests

Note that these changes do not introduce support for Integral Divide and Remainder expressions. I will be filing issues to track them separately and work on them once these changes are merged

-->

@coderfender coderfender marked this pull request as draft August 13, 2025 01:08
@coderfender
Copy link
Contributor Author

coderfender commented Aug 14, 2025

Added test cases for addition , subtraction and multiplication use cases. Given that spark inherently converts decimal operands to doubles (and decimals in case of integral division in comet) , I am yet to make code changes needed to test out those use cases

@coderfender coderfender marked this pull request as ready for review August 14, 2025 22:36
@codecov-commenter
Copy link

codecov-commenter commented Aug 14, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 58.48%. Comparing base (f09f8af) to head (2398fd8).
⚠️ Report is 544 commits behind head on main.

Files with missing lines Patch % Lines
...main/scala/org/apache/comet/serde/arithmetic.scala 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2136      +/-   ##
============================================
+ Coverage     56.12%   58.48%   +2.36%     
- Complexity      976     1440     +464     
============================================
  Files           119      146      +27     
  Lines         11743    13505    +1762     
  Branches       2251     2347      +96     
============================================
+ Hits           6591     7899    +1308     
- Misses         4012     4375     +363     
- Partials       1140     1231      +91     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderfender coderfender changed the title feat: init_ansi_mode_enabled feat: implement_ansi_mode_arithmetic Aug 15, 2025
@coderfender coderfender changed the title feat: implement_ansi_mode_arithmetic feat: implement_ansi_eval_mode_arithmetic Aug 15, 2025
@coderfender
Copy link
Contributor Author

coderfender commented Aug 15, 2025

@andygrove , @kazuyukitanimura
There is a test failure in Spark 4.0 where division operation is failing (as expected in ANSI mode) but the error message cannot divide by zero is what spark tests are expecting but now that the division op is sent to comet, the error message is different. We have two options to go forward

  1. Change the error message to match exactly with Spark
  2. Update test in spark , create a new diff file pass the test.

I am inclined towards #1 given the effort to reward ratio

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Aug 15, 2025

Change the error message to match exactly with Spark

@coderfender I agree. Do you plan to change
native/spark-expr/src/error.rs ?
Hopefully we can make a message that can satisfy multiple Spark versions.

@coderfender
Copy link
Contributor Author

After some debugging early exception raising (from datafusion custom ANSI kernel) seems to be the issue . Below is the exact test which is causing a failure in Spark4.0

  test("fix ANSI support for divide") {
    val data = Seq((Integer.MIN_VALUE, 0))
    withSQLConf(
      SQLConf.ANSI_ENABLED.key -> "true",
      CometConf.COMET_ANSI_MODE_ENABLED.key -> "true",
      "spark.comet.explainFallback.enabled" -> "true") {
      withParquetTable(data, "tbl") {
        spark.sql("CREATE TABLE conditional_t USING PARQUET AS SELECT c1, c2 FROM VALUES(1d, 0),(2d, 1),(null, 1),(CAST('NaN' AS DOUBLE), 0) AS t(c1, c2);")
        val res = spark.sql("""
                              |SELECT coalesce(c2, 1/0) from conditional_t;
                              |  """.stripMargin)

        res.explain()
        checkSparkAnswerAndOperator(res)
      }
    }
  }

@coderfender
Copy link
Contributor Author

coderfender commented Aug 17, 2025

@kazuyukitanimura , @andygrove

I updated the exception / message to be exactly matching with Spark . However, it seems like the issue is with coalesce which in spark is ignoring divide by zero exception while it cant perform the same operation with comet (exception is raised on the native side) . I believe the options we have are

  1. Update diff file to expect an exception (Spark 4+)
  2. Update code to handle native exceptions on the Scala side (Not exactly sure if we have an existing mechanism to do that and less inclined to take this way forward) .

Would love to know your thoughts and also catch up if I am missing something.

Code to reproduce above error :

test("fix ANSI support coalesce") {
    val data = Seq((Integer.MIN_VALUE, 0))
    withSQLConf(
      SQLConf.ANSI_ENABLED.key -> "true",
      CometConf.COMET_ANSI_MODE_ENABLED.key -> "true") {
      withParquetTable(data, "tbl") {
        spark.sql(
          "CREATE TABLE conditional_t USING PARQUET AS SELECT c1, c2 FROM VALUES(1d, 0),(2d, 1),(null, 1),(CAST('NaN' AS DOUBLE), 0) AS t(c1, c2);")
        val res = spark.sql(s"""
                              |SELECT coalesce(c2, 1/0) from conditional_t;
                              |  """.stripMargin)
        
        checkSparkMaybeThrows(res) match {
          case (Some(sparkExc), Some(cometExc)) =>
            val cometErrorPattern =
              """org.apache.comet.CometNativeException"""
            assert(cometExc.getMessage.contains(cometErrorPattern))
            assert(sparkExc.getMessage.contains("Division by zero"))
          case _ => fail("Exception should be thrown")
        }

      }
    }
  }
  

@coderfender
Copy link
Contributor Author

coderfender commented Aug 20, 2025

Investigating into new test failures

@parthchandra
Copy link
Contributor

here is a test failure in Spark 4.0 where division operation is failing (as expected in ANSI mode) but the error message cannot divide by zero is what spark tests are expecting but now that the division op is sent to comet, the error message is different. We have two options to go forward

  1. Change the error message to match exactly with Spark
  2. Update test in spark , create a new diff file pass the test.

I am inclined towards #1 given the effort to reward ratio

@mbutrovich I seem to recall that you had a solution for this kind of test failure

@coderfender
Copy link
Contributor Author

coderfender commented Aug 23, 2025

@parthchandra , thank you for the review. coalesce in spark behaves differently than native comet (essentially performing lazy evaluation of the operands naturally suppressing exceptions). I was able to get that fixed with help from @kazuyukitanimura to disable coalesce when ANSI mode is enabled . However, there seems to be a new spark test failure ( apart from the PR build failure which I fixed by generating new set of golden files) and I am having some troubles in finding the exact test which failed. Once that is fixed this PR should be ready for further review

Failing test logs : https://github.com/apache/datafusion-comet/actions/runs/17112443091/job/48536768203?pr=2136

Thank you

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch from 922d7fc to 1022595 Compare August 23, 2025 21:21
@coderfender
Copy link
Contributor Author

I was able to comb through the failed tests and fixed the failing int check case match statement

@coderfender
Copy link
Contributor Author

Summary of changes

Native Side :

  1. Implemented ANSI mode to throw exceptions for arithmetic operations
  2. Changes to planner.rs to support eval mode all the way to scalar function UDF
  3. Update checked_arithmetic.rs to return NULL or throw Spark like exception depending on overflow.
  4. Add support to Float data types (in addition to int data types)

Scala side:

  1. Disable replacing 0's with NULLs when ANSI mode is enabled for arithmetic operation
  2. Disable comet coalesce when ANSI mode is enabled. This is because coalesce in comet throws an exception while spark suppresses it
    (This needed generation of new golden files on the comet side given that plans are changed AFAIK)

cc : @andygrove , @kazuyukitanimura

@andygrove
Copy link
Member

2. Disable comet coalesce when ANSI mode is enabled. This is because coalesce in comet throws an exception while spark suppresses it

To clarify this a bit more, Spark performs lazy evaluation on the inputs to coalesce and will stop evaluating as soon as it finds the first non-null input. Comet (and DataFusion) currently evaluate all of the inputs. This is why we see an error in Comet but not in Spark.

@andygrove
Copy link
Member

  1. Disable comet coalesce when ANSI mode is enabled. This is because coalesce in comet throws an exception while spark suppresses it

To clarify this a bit more, Spark performs lazy evaluation on the inputs to coalesce and will stop evaluating as soon as it finds the first non-null input. Comet (and DataFusion) currently evaluate all of the inputs. This is why we see an error in Comet but not in Spark.

I filed #2231 to track this on Comet side

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch 4 times, most recently from d3ee468 to a740ee3 Compare September 4, 2025 18:39
@coderfender
Copy link
Contributor Author

Rebased with main branch after lazy evaluation changes (Coalesce) are merged

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch from 4d825d8 to 58a1bee Compare September 6, 2025 06:57
@coderfender
Copy link
Contributor Author

Resolved issues with failing tests caused by incorrect diff file generation .

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch from fa016e3 to 1eca257 Compare September 19, 2025 20:55
@coderfender
Copy link
Contributor Author

@andygrove , @parthchandra I rebased the feature branch with main branch and CI seems to be passing . Please review the changes once you get a chance

@coderfender
Copy link
Contributor Author

coderfender commented Sep 20, 2025

Tagging all ANSI PRs : #536 , #535 , #534 , #533

}

test("ANSI support for divide (division by zero)") {
// TODO : Support ANSI mode in Integral divide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this TODO still need to be addressed? Perhaps we need a link to another issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. I already have changes ready for for integral divide #2421 which should be ready for review once this PR is merged to main . Given that this PR has been up for quite sometime now, I didnt want to add in more changes to keep the focus scoped down to simple arithmetic functions only.

if eval_mode == EvalMode::Try && data_type.is_integer() {
if [EvalMode::Try, EvalMode::Ansi].contains(&eval_mode)
&& (data_type.is_integer()
|| (data_type.is_floating() && op == DataFusionOperator::Divide))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the float/divide case covered by existing tests, or should a new test be added for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. All division in comet is float or decimal division . I have had to add this check to make sure that float operands are supported only when the operation is division and the existing tests should cover this . I also have a draft PR ready for supporting Integral division #2421 which should be the ready for review once this is merged :)

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch from 1d072c0 to 9603223 Compare September 25, 2025 18:16
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI. Thanks for your patience @coderfender

@coderfender
Copy link
Contributor Author

Thank you @andygrove , CI passed and I will be raising a PR for Integral DIvide ANSI implementation to build off of this

@andygrove andygrove merged commit 03c0626 into apache:main Sep 25, 2025
102 checks passed
@coderfender coderfender deleted the support_ansi_mode_arithmetic_functions branch September 25, 2025 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable ANSI support for arithmetic operations

5 participants