feat(workflow-operator): add Python UDF UI parameter injection model#5141
feat(workflow-operator): add Python UDF UI parameter injection model#5141carloea2 wants to merge 7 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5141 +/- ##
============================================
- Coverage 48.95% 46.66% -2.30%
+ Complexity 2377 2376 -1
============================================
Files 1048 1048
Lines 40270 40098 -172
Branches 4272 4267 -5
============================================
- Hits 19714 18711 -1003
- Misses 19402 20256 +854
+ Partials 1154 1131 -23
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@mengw15 can you review this PR? Thanks |
|
|
||
| private val InjectedUiParametersHookMethodName = "_texera_injected_ui_parameters" | ||
| private val InjectedUiParametersHookMethodHeader = | ||
| s"def $InjectedUiParametersHookMethodName(self) -> Dict[str, Any]:" |
There was a problem hiding this comment.
Dict and Any aren't actually exported by from pytexera import * — its __all__ only re-exports Iterator, Optional, and Union from typing. So in the user's module these names are undefined, and since we run Python 3.10–3.13 (where return annotations are evaluated at definition time), the class raises NameError: name 'Dict' is not defined the moment it loads. The test that asserts the output shouldn't contain import typing quietly locks this assumption in, so it'll stay green while generating code that can't run. We could add Dict/Any to pytexera's __all__, inject a from typing import Dict, Any, or just drop the return annotation — whichever we pick, it'd be good to settle it before the runtime PR starts depending on it.
There was a problem hiding this comment.
This is covered in next PR actually but I did not include it here since this is for scala changes. Next PR will be about Python side.
| private def buildInjectedHookMethod(uiParameters: List[UiUDFParameter]): String = { | ||
| val injectedParametersMap = buildInjectedParametersMap(uiParameters) | ||
|
|
||
| (pyb"""|@overrides |
There was a problem hiding this comment.
There's no _texera_injected_ui_parameters in any base class yet, so @overrides will fail at class-definition time — it checks that the method really overrides a parent, and its default signature check calls get_type_hints, which also runs into the Dict/Any issue above. That's a perfectly reasonable thing to leave for the runtime PR; I'd just call it out in the description (or in a comment right here) so it's clear the generated code isn't runnable on its own yet.
| if (lineEnd < 0) text.length else lineEnd | ||
| } | ||
|
|
||
| private def detectClassBlockEnd(code: String, classHeaderStart: Int, classIndent: String): Int = { |
There was a problem hiding this comment.
This decides the class body ends at the first non-blank line indented at or below the class header. That's right for ordinary code, but it doesn't account for string literals, so a triple-quoted block inside a method whose contents start at column 0 looks like the end of the class — and we'd splice the hook into the middle of the string:
class ProcessTupleOperator(UDFOperatorV2):
def process_tuple(self, tuple_, port):
sql = """
SELECT * FROM t
"""
yield tuple_pybuilder already has PythonLexerUtils/BoundaryValidator for exactly this kind of lexical case, so it might be worth reusing them instead of scanning by indentation. If that's more than you want to take on here, a test that documents the limitation would at least make it a known boundary.
There was a problem hiding this comment.
I can take a look about this one, thanks
| private def injectHookIntoUserClass(encodedUserCode: String, hookMethod: String): String = { | ||
| val classHeaderMatch = | ||
| SupportedPythonUdfClassHeaderRegex.findFirstMatchIn(encodedUserCode).getOrElse { | ||
| return encodedUserCode |
There was a problem hiding this comment.
When the code doesn't contain one of the four supported class names, we hand it back unchanged even if there are parameters to inject. So if a user renames their class, their configured parameters just disappear at runtime with no signal. Since validate already raises for bad input, it might be friendlier to raise here too when uiParameters is non-empty but nothing matched — otherwise it's a confusing "my parameters did nothing" situation.
There was a problem hiding this comment.
Got it, I will update it
| val parameters = Option(uiParameters).getOrElse(List.empty) | ||
| validate(parameters) | ||
|
|
||
| val encodedUserCode = pyb"$code".encode |
There was a problem hiding this comment.
Minor: running the whole program through pyb"$code".encode also sends it through stripMargin (the builder's render calls it), so any user line that's whitespace followed by | would get rewritten — even on the no-parameter passthrough path. It's an unlikely shape in a real UDF, but since this is the first place arbitrary user source flows through pyb, it's worth being aware of.
There was a problem hiding this comment.
I'll take care of it.
|
Thanks @Xiao-zhen-Liu, I will ping you when it is ready for the next review. |
What changes were proposed in this PR?
This PR adds the Scala backend foundation for Python UDF UI parameters.
It introduces:
UiUDFParameter, containing backend-compatibleattributemetadata and an editablevalue.PythonUdfUiParameterInjector, which validates UI parameters and injects a reserved hook method into supported Python UDF classes.Attribute.getName()as encodable so UI parameter names are safely rendered through the Python template builder.This PR is stacked after the merged frontend foundation PR #5043. It does not yet wire the injector into operator execution; that wiring is handled by later PRs in the stack.
Existing Python UDF workflow execution remains unchanged in this PR because
PythonUDFOpDescV2,PythonUDFSourceOpDescV2, andDualInputPortsPythonUDFOpDescV2are not modified here.Any related issues, documentation, discussions?
Part of the Python UDF UI parameter feature split from
feat/ui-parameter.Related tracking issue / stack: #5044
Stack order:
How was this PR tested?
Commands run:
Results:
PythonUdfUiParameterInjectorSpec: 10 tests passed.scalafmtAll: no file changes.scalafmtCheckAllandscalafixAll --check: passed.Was this PR authored or co-authored using generative AI tooling?
No