Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28321 Support select alias in the having clause for CBO #5294

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

ramesh0201
Copy link
Contributor

What changes were proposed in this pull request?

Having clause CBO path of the queries should have visibility from the alias of the select expressions, as this is the case in the non-CBO path.

Why are the changes needed?

In order to make the CBO path and non-CBO work the same way in terms of syntax and semantics of having clause

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Unit tests. Specifically mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=limit_pushdown_negative.q

RowResolver inputRR = relToHiveRR.get(srcRel);
for (ASTNode astNode : exprToAlias.keySet()) {
if (inputRR.getExpression(astNode) != null) {
inputRR.put("", exprToAlias.get(astNode), inputRR.getExpression(astNode));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that tableAlias is an empty string here, and even in non-cbo path it is an empty string, but not sure if there is a better way to deal with this. I see several instances in the code base where we use empty string for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I think the better way to handle is to pass the tableAlias field instead of the "" and then let the field fallback to the empty string if it is empty. Given that it is a much broader scope than this PR, I would prefer to address this issue separately. Let me know what you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine with me :)

@@ -1,4 +1,5 @@
-- Test HAVING clause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are supporting this through CBO, I think it'd be nice to include a CBO plan for one of the queries below.
Also, it would be good to add a more complex test with multiple column aliases in the having clause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more tests and EXPLAIN CBO plans is a good idea.

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's add the extra tests and small suggested improvements and then we can get this in.

Comment on lines 5171 to 5177
Map<ASTNode, String> exprToAlias = qbPI.getAllExprToColumnAlias();
RowResolver inputRR = relToHiveRR.get(srcRel);
for (ASTNode astNode : exprToAlias.keySet()) {
if (inputRR.getExpression(astNode) != null) {
inputRR.put("", exprToAlias.get(astNode), inputRR.getExpression(astNode));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is basically modifying the RowResolver adding some kind of reverse mappings. Moreover, this logic is duplicated in SemanticAnalyzer#genHavingPlan. I think it makes sense to refactor this snippet and create a new method in the RowResolver class with a proper name (and javadoc if necessary) that clarifies what is the code supposed to do. The only parameter to the method should be a Map<ASTNode, String>.

@@ -1,4 +1,5 @@
-- Test HAVING clause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more tests and EXPLAIN CBO plans is a good idea.

* @return
* @throws SemanticException
*/
public void putAggregateAlias(Map<ASTNode, String> exprToColumnAlias) throws SemanticException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the method is unnecessarily tight to aggregates and aliases while the implementation seems much more generic than that. Given that we just call put for every entry in the map maybe a better name would be putAll or maybe replaceAll since we only perform the action if an entry is already there.

Will this logic really add an entry in the resolver or it just changes the alias for ColumnInfo that is registered previously? Depending on the answer another potential name would be replaceAliases.

A well chosen name will better indicate what the method really does without requiring checking the implementation.


EXPLAIN CBO SELECT count(value) as c, max(key) as m from src GROUP BY key HAVING c > 3 and m > 0;
SELECT count(value) as c, max(key) as m from src GROUP BY key HAVING c > 3 and m > 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are talking about max(key) does the m > 0 condition filter anything? If not then maybe changing to m > 400 would be a safer choice.

Copy link

sonarcloud bot commented Jul 20, 2024

@ramesh0201
Copy link
Contributor Author

Thank you @soumyakanti3578 and @zabetak for the review

@ramesh0201 ramesh0201 merged commit 3af6fca into apache:master Jul 30, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants