Skip to content

[FLINK-37780][5/N] predict sql function type inference and validation #26583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 28, 2025

Conversation

lihaosky
Copy link
Contributor

@lihaosky lihaosky commented May 22, 2025

What is the purpose of the change

Validate predict function arguments

Brief change log

  • Refactor a few methods from window TVF to utils
  • Check first arg output is row
  • Check second arg is SqlModelCall
  • Check third arg is descriptor and columns appears in first arg output type
  • Check descriptor columns type can be implicitly casted to model's input type
  • Check Optional config arg is map of string literals
  • Return type inference

Verifying this change

Unit test

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (JavaDocs)

@flinkbot
Copy link
Collaborator

flinkbot commented May 22, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@lihaosky lihaosky changed the title [FLINK-37780] predict sql function type inference and validation [FLINK-37780][5/N] predict sql function type inference and validation May 22, 2025
@airlock-confluentinc airlock-confluentinc bot force-pushed the model-function-validation branch 4 times, most recently from a4d988b to c5b378f Compare May 23, 2025 18:08
Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. I left some comments

// TODO: FLINK-37780 Check operand types after integrated with SqlExplicitModelCall in
// validator
return false;
if (!SqlValidatorUtils.checkTableAndDescriptorOperands(callBinding, 2, 1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to validate this again? SqlMLTableFunction#validateCall has already validated it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateCall doesn't check the position and count of descriptor also doesn't check first param needs to be table. validateCall can be used by both ml_predict and ml_evaluate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I dropped the validation for descriptor in validateCall since it make the column name complex and failed later stage in rel converter: https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2250-L2254

@@ -112,5 +140,92 @@ public String getAllowedSignatures(SqlOperator op, String opName) {
return opName
+ "(TABLE table_name, MODEL model_name, DESCRIPTOR(input_columns), [MAP[]]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you miss the ) here?

return typeFactory
.builder()
.kind(inputRowType.getStructKind())
.addAll(inputRowType.getFieldList())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at SystemOutputStrategy#inferType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we need to make field names unique? I'm following SqlWindowTableFunction which doesn't check if input table column has window_start etc. I'm on the fence here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Otherwise, you will get an error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@airlock-confluentinc airlock-confluentinc bot force-pushed the model-function-validation branch from 1dcf660 to 6031544 Compare May 27, 2025 20:57
Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fsk119 fsk119 merged commit 2cccfbf into apache:master May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants