Skip to content

[SPARK-52497][DOCS] Add docs for SQL user-defined functions #51281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds docs for SQL UDFs.

Why are the changes needed?

Add documentation for a new Spark 4 feature.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually verify the documentation build

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the DOCS label Jun 25, 2025
@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan @srielau


When `TEMPORARY` is specified, the function is only available for the current session. Otherwise, it is persisted in the catalog and available across sessions. The `OR REPLACE` option allows updating an existing function definition, while `IF NOT EXISTS` prevents errors when creating a function that already exists.

The function parameters must be specified with their data types. The return type can be either a scalar data type or a table with an optional schema definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the return table schema definition is optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, the entire RETURN clause is optional, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RETURNS is optional, RETURN is not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the RETURNS clause is optional for scalar UDFs, and the RETURNS TABLE schema is optional for TVFs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be The return type can be either a scalar data type or a table with an schema definition. If not specified the return type will be inferred from the function body?


### Syntax

```sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know in Spark we use sql, but it makes no sense to do that for the syntax.. I wish there were a BNF ....

@xinrong-meng
Copy link
Member

LGTM! python3.9: not found in CI shall we rebase master branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants