Skip to content

Conversation

@comphead
Copy link
Contributor

Which issue does this PR close?

Closes #2552 .

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@comphead
Copy link
Contributor Author

Depends on #2586

@comphead comphead changed the title feat: support concat feat: support concat for strings Oct 26, 2025
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 59.22%. Comparing base (f09f8af) to head (164dbfe).
⚠️ Report is 649 commits behind head on main.

Files with missing lines Patch % Lines
...rc/main/scala/org/apache/comet/serde/strings.scala 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2604      +/-   ##
============================================
+ Coverage     56.12%   59.22%   +3.10%     
- Complexity      976     1448     +472     
============================================
  Files           119      147      +28     
  Lines         11743    13762    +2019     
  Branches       2251     2365     +114     
============================================
+ Hits           6591     8151    +1560     
- Misses         4012     4387     +375     
- Partials       1140     1224      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@comphead comphead requested a review from andygrove October 27, 2025 19:31
@comphead
Copy link
Contributor Author

comphead commented Oct 27, 2025

@andygrove please take another look.
concat works with strings, and for other datatypes it is being fixed in DataFusion apache/datafusion#18020

@comphead comphead marked this pull request as ready for review October 27, 2025 20:51
classOf[BitLength] -> CometScalarFunction("bit_length"),
classOf[Chr] -> CometScalarFunction("char"),
classOf[ConcatWs] -> CometScalarFunction("concat_ws"),
classOf[Concat] -> CometScalarFunction("concat"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need type checks so that we fall back to Spark for unsupported argument types?

Perhaps something like this?

object CometConcat extends CometScalarFunction[Concat]("concat") {
  override def getSupportLevel(expr: Concat): SupportLevel = {
    if (expr.children.forall(_.dataType == DataTypes.StringType)) {
      Compatible()
    } else {
      Incompatible(Some("Only string arguments are supported"))
    }
  }
}

Comment on lines 140 to 143
createFunctionWithInputTypes(
"concat",
Seq(SparkStringType, SparkStringType)
), // TODO: variadic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this PR is just to support string inputs in Comet concat, but the fuzz tester should ideally test for all types that Spark supports

+- CometHashAggregate (67)
+- CometExpand (66)
+- CometUnion (65)
:- CometHashAggregate (22)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hash aggregate now supported in Comet? I don't see concat used in the query.

https://github.com/apache/datafusion-benchmarks/blob/main/tpcds/queries-spark/q5.sql

Comment on lines 3160 to 3161
// https://github.com/apache/datafusion-comet/issues/2647
ignore("test concat function - arrays") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you enable these tests and use the recently added checkSparkAnswerAndFallbackReason method to make sure we are correctly falling back to Spark?

@comphead
Copy link
Contributor Author

@andygrove PTAL

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @comphead!

@andygrove
Copy link
Member

In case anyone is wondering why more operators are now running natively, even though the TPC-DS queries do not contain concat ... Spark adds concats in aggregates and Comet was previously falling back to Spark:

:- HashAggregate [COMET: concat is not supported, Unsupported result expressions found in: Vector(MakeDecimal(sum(UnscaledValue(sales_price#3596))#3818L,17,2) AS sales#3606, MakeDecimal(sum(UnscaledValue(return_amt#3598))#3820L,17,2) AS returns#3608, (MakeDecimal(sum(UnscaledValue(profit#3597))#3819L,17,2) - MakeDecimal(sum(UnscaledValue(net_loss#3599))#3821L,17,2)) AS profit#3584, store channel AS channel#3840, concat(store, s_store_id#3856) AS id#3841)]

@andygrove andygrove merged commit f826b65 into apache:main Oct 30, 2025
102 checks passed
@rluvaton
Copy link
Member

rluvaton commented Nov 4, 2025

In case anyone is wondering why more operators are now running natively, even though the TPC-DS queries do not contain concat ... Spark adds concats in aggregates and Comet was previously falling back to Spark:

:- HashAggregate [COMET: concat is not supported, Unsupported result expressions found in: Vector(MakeDecimal(sum(UnscaledValue(sales_price#3596))#3818L,17,2) AS sales#3606, MakeDecimal(sum(UnscaledValue(return_amt#3598))#3820L,17,2) AS returns#3608, (MakeDecimal(sum(UnscaledValue(profit#3597))#3819L,17,2) - MakeDecimal(sum(UnscaledValue(net_loss#3599))#3821L,17,2)) AS profit#3584, store channel AS channel#3840, concat(store, s_store_id#3856) AS id#3841)]

concat exists in the query, no?:

'store' || s_store_id as id

https://github.com/apache/datafusion-benchmarks/blob/c472383a2e0570c85766cd8ec946614dd4fda542/tpcds/queries-spark/q5.sql#L105

but spark just move it to aggregate avoid copying more data perhaps?

@comphead
Copy link
Contributor Author

comphead commented Nov 4, 2025

:) tadaam @rluvaton thats' true, I totally forgot about || 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comet cannot accelerate Concat because: concat is not supported

4 participants