-
Notifications
You must be signed in to change notification settings - Fork 967
[AUTHZ] count(*)/count(1) should check sub nodes' privileges #7204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
related issue: #7173, cc @bowenliang123 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7204 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 695 698 +3
Lines 43505 43495 -10
Branches 5888 5886 -2
======================================
+ Misses 43505 43495 -10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| import org.apache.kyuubi.plugin.spark.authz.util.WithInternalChild | ||
|
|
||
| case class ChildOutputHolder(child: LogicalPlan, fixedOutput: Seq[Attribute]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add comments to explain why we need this.
fixedOutput is not a good name, it fixed what? maybe just call it childOutput
| if (columnPrune(p.references.toSeq ++ p.output, p.inputSet).isEmpty) { | ||
| // If plan is project and output don't have relation to input, can ignore. | ||
| if (!p.isInstanceOf[Project]) { | ||
| // If plan tree exists ChildOutputHolder, we should build child logic plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the variable name existsChildOutputHolder itself explains what it does, a good comment explains WHY, not WHAT.
|
this approach lgtm, I suggest adding some comments. cc @wForget will this affect lineage? also cc @zhouyifan279 since you have worked closely on this part |
It seems not, |
...-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/plan/ChildOutputHolder.scala
Show resolved
Hide resolved
| // some nodes in the tree is fixed by RuleChildOutputMarker in some special | ||
| // scenarios, such as the Aggregate(count(*)) child node. To avoid missing child node | ||
| // permissions, we need to continue checking down. | ||
| if (!p.isInstanceOf[Project] || existsChildOutputHolder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect children that are not ChildOutputHolder ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect children that are not
ChildOutputHolder?
This check is to allow PrivilegeBuilder.buildQuery method continue drill down to the child nodes of the ChildOutputHolder node. Without this check, the PrivilegeBuilder.buildQuery method will terminate before hitting ChildOutputHolder. Since the outputs held by ChildOutputHolder are all useful, combined with columnPrune method, I think other child nodes will not be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is in for (child <- p.children) {, which will be applied to all children of p. Do we need it for children of p that are not ChildOutputHolder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is in
for (child <- p.children) {, which will be applied to all children ofp. Do we need it for children ofpthat are notChildOutputHolder?
Normally, recursive checking of deeper nodes is necessary, but the presence of an Aggregate(count(*)) node is an exception, as it blocks recursion. In current case, ChildOutputHolder is only added when encountered Aggregate(count(*)) node. When a ChildOutputHolder node exists in the plan tree, it indicates that nodes deeper than ChildOutputHolder holds useful output information, we need to continue checking deeper nodes. At the same time, when recursing to the node deeper than ChildOutputHolder, we regress to normal judgment logic. Therefore, I think the judgment here is reasonable.
Spark Optimizer's ColumnPruning will replace
count(*)/count(1)Aggregateplan's child to aProjectnode with empty projection list:but AuthZ plugin's
PrivilegesBuilder.buildQuerymethod will ignore to check child node when plan'sinputSetis empty, in this scenario,Aggregatenode's child plan's privileges are ignored, which causecount(*)/ count(1)will ignored all privileges that should be checked.Why are the changes needed?
this patch add a holder node
ChildOutputHolderwhenAggregatenode'sreferencesand it's child node'soutputSethava no intersection, it will hold the child node'soutputSetused for build privilege objects, andChildOutputHoldernode will be eliminated afterRuleAuthorizationrule work completed.How was this patch tested?
updated old unit tests and added new unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.