[SPARK-52738][SQL] Support aggregating the TIME type with a UDAF when the underlying buffer is an UnsafeRow

bersprockets · MaxGekk · commit d88298a06aff · 2025-07-10T11:51:45.000+02:00
### What changes were proposed in this pull request? - Change `BufferSetterGetterUtils` to use `InternalRow.setLong` for setting TIME values rather then `InternalRow.update`. - Change `BufferSetterGetterUtils` to use `InternalRow.getLong` for getting TIME values. - Update the test "udaf with all data types" in `AggregationQuerySuite` so that it checks aggregation with both an unsafe and safe aggregation buffer. Since SPARK-41359, that test has been testing with only a safe aggregation buffer. ### Why are the changes needed? When a query uses a UDAF to aggregate a TIME column , and all other columns are "mutable" (as determined by `UnsafeRow#isMutable`), the aggregator creates an `UnsafeRow` for the low-level aggregation buffer. However, the wrapper of that buffer (`MutableAggregationBufferImpl`) fails to properly set up a field setter function for the TIME column, so it attempts to call `UnsafeRow.update` on the underlying buffer. The `UnsafeRow` instance throws `org.apache.spark.SparkUnsupportedOperationException`: ``` Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.SparkUnsupportedOperationException: [UNSUPPORTED_CALL.WITHOUT_SUGGESTION] Cannot call the method "update" of the class "org.apache.spark.sql.catalyst.expressions.UnsafeRow". SQLSTATE: 0A000 ``` See SPARK-52738 for a reproduction example. ### Does this PR introduce _any_ user-facing change? No. The TIME type is not released yet. ### How was this patch tested? Updated a unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51430 from bersprockets/time_udaf. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala
@@ -84,7 +84,7 @@ sealed trait BufferSetterGetterUtils {
           (row: InternalRow, ordinal: Int) =>
             if (row.isNullAt(ordinal)) null else row.getInt(ordinal)
 
-        case TimestampType | TimestampNTZType =>
+        case TimestampType | TimestampNTZType | _: TimeType =>
           (row: InternalRow, ordinal: Int) =>
             if (row.isNullAt(ordinal)) null else row.getLong(ordinal)
 
@@ -188,7 +188,7 @@ sealed trait BufferSetterGetterUtils {
               row.setNullAt(ordinal)
             }
 
-        case TimestampType | TimestampNTZType =>
+        case TimestampType | TimestampNTZType | _: TimeType =>
           (row: InternalRow, ordinal: Int, value: Any) =>
             if (value != null) {
               row.setLong(ordinal, value.asInstanceOf[Long])
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
@@ -23,7 +23,7 @@ import test.org.apache.spark.sql.MyDoubleAvg
 import test.org.apache.spark.sql.MyDoubleSum
 
 import org.apache.spark.sql.{AnalysisException, DataFrame, QueryTest, RandomDataGenerator, Row}
-import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
+import org.apache.spark.sql.catalyst.expressions.{CodegenObjectFactoryMode, UnsafeRow}
 import org.apache.spark.sql.classic.ClassicConversions.castToImpl
 import org.apache.spark.sql.classic.Dataset
 import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction}
@@ -899,11 +899,15 @@ abstract class AggregationQuerySuite extends QueryTest with SQLTestUtils with Te
       ArrayType(IntegerType), MapType(StringType, LongType), struct,
       new TestUDT.MyDenseVectorUDT()) ++ dayTimeIntervalTypes ++ unsafeRowMutableFieldTypes ++
       timeTypes
-    // Right now, we will use SortAggregate to handle UDAFs.
-    // UnsafeRow.mutableFieldTypes.asScala.toSeq will trigger SortAggregate to use
-    // UnsafeRow as the aggregation buffer. While, dataTypes will trigger
-    // SortAggregate to use a safe row as the aggregation buffer.
-    Seq(dataTypes).foreach { dataTypes =>
+    // A schema that contains only data types where UnsafeRow.isMutable is true
+    // will trigger the aggregator to use unsafe row as the aggregation buffer.
+    // Other dataTypes will trigger the aggregator to use a safe row as the
+    // aggregation buffer.
+    //
+    // Below we want to test with *both* UnsafeRow and safe row as the underlying
+    // buffer.
+    val mutableDataTypes = dataTypes.filter(UnsafeRow.isMutable)
+    Seq(dataTypes, mutableDataTypes).foreach { dataTypes =>
       val fields = dataTypes.zipWithIndex.map { case (dataType, index) =>
         StructField(s"col$index", dataType, nullable = true)
       }