Add lambda function and array related functions #3584

xinyual · 2025-04-27T06:03:03Z

Description

This pr adds lambda function and array related functions. Calcite don't have array related functions so we need to implement by ourselves.
Now the logic for lambda is:
We will consider lambda function as a new PPL expression and parse it regularly to construct rexnode. To get return type for lambda expression, we need to firstly map the argument type in the calciteContext. For example, forall(array(1, 2, 3), x -> x > 0), then x -> INTEGER.
We also have an exception for reduce because the acc is the dynamic type.
The calcite/lin4j generate code according to the input type. For example, reduce(array(1.0,2.0 ,3.0), 0, (acc, x) -> acc + x). Ideally, we should map acc -> INTEGER, x -> DOUBLE. But if we map through this, the code of + would be plus(INTERGER acc, DOUBLE x), then after first apply, the acc would be double, then it will throw exception. Thus, we apply ANY to the acc and infer the return type in getReturnTypeInference

The function is aligned with https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-collection.md

TODO: nested object is not supported in lambda currently. It will be automatically supported when we support this. E.g. x -> x.a > 0

For detailed implementation and description:

Functions	argument	description	return type	implementation
ARRAY	ARRAY(value1: ANY, value2:ANY, ...)	create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example array(1, "demo") -> ["1", "demo"]	ARRAY	wrap `SqlLibraryOperators.ARRAY`
ARRAY_LENGTH	ARRAY_LENGTH(value: ARRAY)	return array length	integer	`SqlLibraryOperators.ARRAY_LENGTH`
FORALL	forall(value:ARRAY, function: LAMBDA)	check whether all element inside array can meet the lambda function. The function should also return boolean.	boolean	implement by ourselves since we cannot find matched built-in calcite one.
EXISTS	exists(value:ARRAY, function: LAMBDA)	check whether existing one of element inside array can meet the lambda function. The function should also return boolean.	boolean	implement by ourselves since we cannot find matched built-in calcite one.
FILTER	filter(value:ARRAY, function: LAMBDA)	filter the element in the array by the lambda function. The function should return boolean	array	implement by ourselves since we cannot find matched built-in calcite one.
TRANSFORM	transform(value:ARRAY, function: LAMBDA)	transform the element of array one by one using lambda. Transform can accept one more argument like (x, i) -> x + i, where i is the index of element in array.	array	implement by ourselves since we cannot find matched built-in calcite one.
REDUCE	reduce(value:ARRAY, base_value:ANY, acc_function: LAMBDA)/reduce(value:ARRAY, base_value:ANY, acc_function: LAMBDA, reduce_function:LAMBDA)	The function will first use acc_function to go through all element and return value to the acc. Then apply reduce function to the acc if exists. The acc_function's lambda format is (acc,x) -> ..., the reduce_function format is (acc) -> ...	ANY, according to the lambda function	implement by ourselves since we cannot find matched built-in calcite one.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#3575

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: xinyual <[email protected]>

penghuo · 2025-05-09T17:15:15Z

core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java

@@ -44,13 +47,16 @@ public class CalcitePlanContext {

  private final Stack<RexCorrelVariable> correlVar = new Stack<>();

+  @Getter public Map<String, RexLambdaRef> temparolInputMap;


what is temparol? typo?

Yes. Typo error, already rename it.

penghuo · 2025-05-09T17:30:46Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

+   * will map type for each lambda argument by the order of previous argument. Also, the function
+   * will add these variables to the context so they can pass visitQualifiedName
+   */
+  private CalcitePlanContext prepareLambdaContext(


LantaoJin · 2025-05-29T02:30:41Z

core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java

@@ -58,6 +58,14 @@ public enum BuiltinFunctionName {
  TAN(FunctionName.of("tan")),
  SPAN(FunctionName.of("span")),

+  /** Collection functions */
+  ARRAY(FunctionName.of("array")),


do you miss ARRAY_LENGHT? https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-collection.md#array

Already add it.

LantaoJin · 2025-05-29T02:32:19Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+    switch (targetType) {
+      case DOUBLE:
+        List<Object> unboxed =
+            IntStream.range(0, args.length - 1)
+                .mapToObj(i -> ((Number) args[i]).doubleValue())
+                .collect(Collectors.toList());
+
+        return unboxed;
+      case FLOAT:
+        List<Object> unboxedFloat =
+            IntStream.range(0, args.length - 1)
+                .mapToObj(i -> ((Number) args[i]).floatValue())
+                .collect(Collectors.toList());
+        return unboxedFloat;


could you explain why this special logic needed?

We need to internally convert it. Otherwise, the calcite will directly cast like DOUBLE to INTEGER, which will raise exception.

LantaoJin · 2025-05-29T02:34:52Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+import org.apache.calcite.sql.type.SqlTypeName;
+import org.opensearch.sql.expression.function.ImplementorUDF;
+
+public class ArrayFunctionImpl extends ImplementorUDF {


can't we reuse SqlLibraryOperators.ARRAY? Again, please add a reason in PR description for any new added function why it must implement by ourselves.

Already update the implementation. Wrap the implementation of SqlLibraryOperators.ARRAY

LantaoJin · 2025-05-29T02:38:55Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ExistsFunctionImpl.java

+import org.apache.calcite.sql.type.SqlReturnTypeInference;
+import org.opensearch.sql.expression.function.ImplementorUDF;
+
+public class ExistsFunctionImpl extends ImplementorUDF {


can't we reuse SqlLibraryOperators.ARRAY_CONTAINS? please check all SqlLibraryOperators.ARRAY_* first.

Confirmed. All SqlLibraryOperators.ARRAY_* is for array related function which is not related to lambda. We use SqlLibraryOperators .array_length

Signed-off-by: xinyual <[email protected]>

xinyual added 11 commits April 23, 2025 16:49

add forall

06fe11a

Signed-off-by: xinyual <[email protected]>

add filter/exists/

31733dd

Signed-off-by: xinyual <[email protected]>

add reduce

e305747

Signed-off-by: xinyual <[email protected]>

add return type inference

8a9d024

Signed-off-by: xinyual <[email protected]>

fix exists

689534b

Signed-off-by: xinyual <[email protected]>

add map for lambda

2cc41d8

Signed-off-by: xinyual <[email protected]>

add infer for reduce

bacccdf

Signed-off-by: xinyual <[email protected]>

add java doc

013f7df

Signed-off-by: xinyual <[email protected]>

merge from main

9fb35fe

Signed-off-by: xinyual <[email protected]>

revert useless change

3d465e6

Signed-off-by: xinyual <[email protected]>

renane

8051c52

Signed-off-by: xinyual <[email protected]>

xinyual marked this pull request as ready for review April 27, 2025 06:05

xinyual requested review from ps48, kavithacm, derek-ho, joshuali925, dai-chen, YANG-DB, mengweieric, Swiddis, penghuo, seankao-az, MaxKsyunz, Yury-Fridlyand, anirudha, forestmvey, acarbonetto, GumpacG, ykmr1224 and LantaoJin as code owners April 27, 2025 06:05

xinyual requested review from noCharger and qianheng-aws as code owners April 27, 2025 06:05

xinyual added 3 commits April 27, 2025 14:56

fix g4

1b99ce5

Signed-off-by: xinyual <[email protected]>

fix g4

b5c4a03

Signed-off-by: xinyual <[email protected]>

fix g4 file

794df29

Signed-off-by: xinyual <[email protected]>

LantaoJin added the calcite calcite migration releated label Apr 29, 2025

xinyual added 2 commits May 26, 2025 15:17

merge from main

f87787b

Signed-off-by: xinyual <[email protected]>

apply spotless

fd97885

Signed-off-by: xinyual <[email protected]>

penghuo reviewed May 28, 2025

View reviewed changes

LantaoJin reviewed May 29, 2025

View reviewed changes

xinyual added 5 commits May 30, 2025 10:55

merge from main

9cca2fc

Signed-off-by: xinyual <[email protected]>

test

241a57c

Signed-off-by: xinyual <[email protected]>

use builtin operator

c90f53d

Signed-off-by: xinyual <[email protected]>

add array_length with test

a22a524

Signed-off-by: xinyual <[email protected]>

optimize reduce

7f9c6ec

Signed-off-by: xinyual <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add lambda function and array related functions #3584

Add lambda function and array related functions #3584

xinyual commented Apr 27, 2025 •

edited

Loading

Uh oh!

penghuo May 9, 2025

Uh oh!

xinyual May 30, 2025

Uh oh!

penghuo May 9, 2025

Uh oh!

LantaoJin May 29, 2025

Uh oh!

xinyual May 30, 2025

Uh oh!

LantaoJin May 29, 2025

Uh oh!

xinyual May 30, 2025

Uh oh!

LantaoJin May 29, 2025

Uh oh!

xinyual May 30, 2025

Uh oh!

LantaoJin May 29, 2025

Uh oh!

xinyual May 30, 2025

Uh oh!

Uh oh!

		@@ -44,13 +47,16 @@ public class CalcitePlanContext {

		private final Stack<RexCorrelVariable> correlVar = new Stack<>();

		@Getter public Map<String, RexLambdaRef> temparolInputMap;

Add lambda function and array related functions #3584

Are you sure you want to change the base?

Add lambda function and array related functions #3584

Conversation

xinyual commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xinyual commented Apr 27, 2025 •

edited

Loading