-
Notifications
You must be signed in to change notification settings - Fork 155
Add lambda function and array related functions #3584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@@ -44,13 +47,16 @@ public class CalcitePlanContext { | |||
|
|||
private final Stack<RexCorrelVariable> correlVar = new Stack<>(); | |||
|
|||
@Getter public Map<String, RexLambdaRef> temparolInputMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is temparol
? typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Typo error, already rename it.
* will map type for each lambda argument by the order of previous argument. Also, the function | ||
* will add these variables to the context so they can pass visitQualifiedName | ||
*/ | ||
private CalcitePlanContext prepareLambdaContext( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add UT
@@ -58,6 +58,14 @@ public enum BuiltinFunctionName { | |||
TAN(FunctionName.of("tan")), | |||
SPAN(FunctionName.of("span")), | |||
|
|||
/** Collection functions */ | |||
ARRAY(FunctionName.of("array")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already add it.
switch (targetType) { | ||
case DOUBLE: | ||
List<Object> unboxed = | ||
IntStream.range(0, args.length - 1) | ||
.mapToObj(i -> ((Number) args[i]).doubleValue()) | ||
.collect(Collectors.toList()); | ||
|
||
return unboxed; | ||
case FLOAT: | ||
List<Object> unboxedFloat = | ||
IntStream.range(0, args.length - 1) | ||
.mapToObj(i -> ((Number) args[i]).floatValue()) | ||
.collect(Collectors.toList()); | ||
return unboxedFloat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why this special logic needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to internally convert it. Otherwise, the calcite will directly cast like DOUBLE to INTEGER, which will raise exception.
import org.apache.calcite.sql.type.SqlTypeName; | ||
import org.opensearch.sql.expression.function.ImplementorUDF; | ||
|
||
public class ArrayFunctionImpl extends ImplementorUDF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we reuse SqlLibraryOperators.ARRAY
? Again, please add a reason in PR description for any new added function why it must implement by ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already update the implementation. Wrap the implementation of SqlLibraryOperators.ARRAY
import org.apache.calcite.sql.type.SqlReturnTypeInference; | ||
import org.opensearch.sql.expression.function.ImplementorUDF; | ||
|
||
public class ExistsFunctionImpl extends ImplementorUDF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we reuse SqlLibraryOperators.ARRAY_CONTAINS
? please check all SqlLibraryOperators.ARRAY_*
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed. All SqlLibraryOperators.ARRAY_* is for array related function which is not related to lambda. We use SqlLibraryOperators .array_length
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Description
This pr adds lambda function and array related functions. Calcite don't have array related functions so we need to implement by ourselves.
Now the logic for lambda is:
We will consider lambda function as a new PPL expression and parse it regularly to construct rexnode. To get return type for lambda expression, we need to firstly map the argument type in the calciteContext. For example, forall(array(1, 2, 3), x -> x > 0), then x -> INTEGER.
We also have an exception for reduce because the acc is the dynamic type.
The calcite/lin4j generate code according to the input type. For example, reduce(array(1.0,2.0 ,3.0), 0, (acc, x) -> acc + x). Ideally, we should map acc -> INTEGER, x -> DOUBLE. But if we map through this, the code of + would be
plus(INTERGER acc, DOUBLE x)
, then after first apply, the acc would be double, then it will throw exception. Thus, we apply ANY to the acc and infer the return type ingetReturnTypeInference
The function is aligned with https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-collection.md
TODO: nested object is not supported in lambda currently. It will be automatically supported when we support this. E.g. x -> x.a > 0
For detailed implementation and description:
SqlLibraryOperators.ARRAY
SqlLibraryOperators.ARRAY_LENGTH
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
#3575
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.