feat: support Spark-compatible `abs` math function #18205

hsiang-c · 2025-10-21T17:05:13Z

Which issue does this PR close?

Part of [EPIC] Complete datafusion-spark Spark Compatible Functions #15914

Rationale for this change

Apache Spark's abs() behaves differently than DataFusion.
Apache Spark's ANSI-compliant dialect can be toggled by SparkConf spark.sql.ansi.enabled. When it is off, arithmetic overflow doesn't throw exception like DataFusion does.
Apache Spark's abs also supports ANSI interval types: YearMonthIntervalType and DayTimeIntervalType
DataFusion Comet can leverage it at fix: re-enable Comet abs datafusion-comet#2595

What changes are included in this PR?

Mimics Apache Spark v4.0.1 abs expression
DataFusion Spark's abs() API takes an additional flag fail_on_error if spark.sql.ansi.enabled=true at caller's side.

Are these changes tested?

unit tests
sqllogictest: test_files/spark/math/abs.slt

Are there any user-facing changes?

Yes, the abs function can be specified in the SQL.

Arithmetic overflow will be thrown when spark.sql.ansi.enabled=true
Support ANSI interval types: YearMonthIntervalType and DayTimeIntervalType

hsiang-c · 2025-10-22T00:08:19Z

cc @comphead for code review, thank you.

comphead · 2025-10-22T00:13:42Z

datafusion/sqllogictest/test_files/spark/math/abs.slt

+
+# abs: signed int minimal values
+query IIII
+select abs(c1), abs(c2), abs(c3), abs(c4) from test_nullable_integer where dataset = 'mins'


wondering would be that easier to test like

query II select abs(1), abs(-1) ---- 1 1

?

instead of creating/dropping tables

Doing abs(-128), abs(-32768) and abs(-2147483648) doesn't work b/c type widening.

Doing abs(-128::SMALLINT), abs(-32768::SMALLINT), abs(-2147483648::INT), abs(-9223372036854775808::BIGINT) throws casting error. For example, DataFusion error: Arrow error: Cast error: Can't cast value 128 to type Int8

I think this is a bug in SQL parsing:

> select -128::tinyint; Arrow error: Cast error: Can't cast value 128 to type Int8 > select (-128)::tinyint; +-------------+ | Int64(-128) | +-------------+ | -128 | +-------------+ 1 row(s) fetched. Elapsed 0.003 seconds.

It casts the 128 value without accounting for the negative; might need to raise an issue for this? Not sure if this is intended behaviour or not

So can wrap it in parentheses to ensure the correct precedence, or alternatively use arrow_cast:

> select arrow_cast(-128, 'Int8'); +--------------------------------------+ | arrow_cast(Int64(-128),Utf8("Int8")) | +--------------------------------------+ | -128 | +--------------------------------------+ 1 row(s) fetched. Elapsed 0.007 seconds.

comphead · 2025-10-22T00:15:33Z

datafusion/sqllogictest/test_files/spark/math/abs.slt

+0 0
+1 1
+1 1
+NULL NULL


its better to use inline query, in this example the answers and input data are out of order and it might be more difficult to read

comphead · 2025-10-22T00:18:05Z

datafusion/sqllogictest/test_files/spark/math/abs.slt

 ## PySpark 3.5.5 Result: {"abs(INTERVAL '-1-1' YEAR TO MONTH)": 13, "typeof(abs(INTERVAL '-1-1' YEAR TO MONTH))": 'interval year to month', "typeof(INTERVAL '-1-1' YEAR TO MONTH)": 'interval year to month'}
-#query
-#SELECT abs(INTERVAL '-1-1' YEAR TO MONTH::interval year to month);
+query error DataFusion error: This feature is not implemented: Unsupported SQL type INTERVAL YEAR TO MONTH


Lets create a github ticket to fix this and refer to it in the comments in addition to the error.

Looks like abs works with intervals for Spark only