Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28488: Merge multiple adjacent union distinct into single adjacent union distinct #5423

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ngsg
Copy link
Contributor

@ngsg ngsg commented Aug 30, 2024

What changes were proposed in this pull request?

This PR proposed a new optimization to reduce the shuffle when computing union distinct of multiple tables. The new optimization merges GroupBy operators computing distinct after Union, thus reducing the number of edges involved by UNION DISTINCT.
A new configuration key, hive.optimize.merge.adjacent.union.distinct, is introduced to configure this optimization.

Please check out the attached slides in the JIRA page (HIVE-28488) for further explanations.

Why are the changes needed?

To improve execution time of query containing many UNION DISTINCT.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

The proposed optimization is tested by running TPC-DS query 49 and 75. This PR contains a qfile test to confirm that the patch optimizes query plan.

@ngsg ngsg changed the title [WIP] HIVE-28488: Merge multiple adjacent union distinct into single adjacent union distinct HIVE-28488: Merge multiple adjacent union distinct into single adjacent union distinct Sep 3, 2024
@ngsg ngsg marked this pull request as ready for review September 3, 2024 01:20
Object... nodeOutputs) throws SemanticException {
UnionMergeContext context = (UnionMergeContext) procCtx;
Collection<Operator> allOps = context.pCtx.getAllOps();
for (int i = 1; i <= 8; i ++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we please add some comments around this magic indices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments to explain the meaning of indices and logic around them.

Copy link

sonarcloud bot commented Sep 27, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants