Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common table expression (CTE) optimizations in CBO #5154

Closed
wants to merge 100 commits into from

Conversation

zabetak
Copy link
Contributor

@zabetak zabetak commented Mar 22, 2024

No description provided.

Stamatis Zampetakis added 13 commits April 5, 2024 14:10
…ITH clauses

Change-Id: I442c179183ce66f2e9ecc5ee7a5891863beb584f
Change-Id: Id1ee6ceabb090b2622c3227dab693593f58dc1b3
Do not set materialize threshold programmatically. If the initial query has CTEs and the value is set to true then its fine to materialize those ones.

Change-Id: I54c7935ae408bdd87cf6eae5ae3f96951f9799c4
…MPS/DEPT schema

Change-Id: I1625953fa8bc73c3e89d9e73ebfab388021e3973
Change-Id: I2b659ba518033553c5230aa47c50245f5ae0db56
Drop some boilerplate code which anyways doesn't do much at the moment.

Change-Id: Ic8b614bd97375b870772f7f88d9bb0890f557ac1
There is a bug somewhere and TableSpool is not introduced correctly. Need to revisit the patch.

Change-Id: I705d71bb5e6dc63f5e1440d49f9bac7fac5f68d8
… single TableScan

Change-Id: I28a7756eb6841656880084d209f10d362d812044
Change-Id: Idf099037e049ca9c145471f4f6ce71f543bab357
Haven't hit a problem yet but its good to have it in place.

Change-Id: I9d36e0355e788f01c207bd71f0457b3a4dbfd7af
TODO: Need to pass the RelOptMaterialization to the HepPlanner.
Change-Id: I76175a38045fd4816d3311aa9ba93a32627e520e
Change-Id: I49da052b09e03dffeaecc64070181d2c042df044
…c & Cost-based planner

Apart from improving readability the refactoring fixes the problem in TablescanToSpoolRule that couldn't see the materializations.

Changes were in needed with respect to trait handling to ensure that HiveRelCopier works as expected.

Now the ed_cte_0_debug.q test passes returning same result as before.

Change-Id: I86a3755a60b0bd8c582f969cd5574756afd590b6
…gHook and keep notes about changes

mapjoin_hint.q.out:
constant_prop_3.q.out:
notInTest.q.out:
1. Duplication is not obviously present in the initial SQL
2. Physical plan not better than SWO but could possibly replace the latter at the CBO level.

correlationoptimizer3.q.out:
1. Materialization and reuse of join result (not exploited by SWO)

masking_2.q.out:
masking_12.q.out:
* Materializes scan + filter
* The new plan looks reasonable but not sure why SWO was not kicking in the initial plan.

masking_10.q.out:
* Materializes scan + filter
* New plan has a cartesian product which is strange

dynamic_partition_pruning.q.out:
1. Materiliazes scan+aggregate but this nukes out DPP pruning (not really sure if that helpful in this case).

dynamic_semijoin_reduction_2.q.out:
1. Materializes scan+aggregate and leads to a different SJ than the original one

explainuser_2.q.out:
explainanalyze_2.q.out:
* Materializes join between two tables

filter_aggr.q.out:
1. OPTIMIZED SQL is not shown probably because we don't handel the spool operator
2. Seems to cancel some optimization with UNION_ALL and identical parts (NOT GOOD)

groupby_sort_1_23.q.out:
groupby_sort_skew_1_23.q.out
1. Materializes scan + aggregate
2. OPTIMIZED SQL does not show

intersect_all.q.out:
intersect_distinct.q.out:
* Materializes join over two tables
* Check if there is blocked optimizations due to INTERESECT with identical parts.

offset_limit_ppd_optimizer.q.out:
limit_pushdown.q.out:
* Materializes scan + aggregate but interferes with limit pushdown optimization as it is right now.

mrr.q.out:
* Materializes scan + aggregate (CTE referenced 3 times)

sharedwork.q.out:
* Materializes scan + very simple filter (IS NOT NULL)
* In such simple cases the materialization is probably useless. Do we gain anything from this?
* Moreover the materilaized filter seems to remain in the final plan.

sharedworkext.q.out:
vectorized_multi_output_select.q.out:
* Materializes a join (the case here is very similar to the main e2e test motivating this work)
* The multiple reducers in the SWO plan is probably due to parallel edges problem; the temporary table materialization does not need the workaround although if we were going directly to the Operator tree we would need to do something similar.

skewjoin_mapjoin7.q.out:
* Materiliazes join (CTE reference twice by UNION ALL)
* we have seen the same pattern multiple times

smb_mapjoin_14.q.out:
* Materializes scan + filter
* seen this before

subquery_ALL.q.out:
subquery_ANY.q.out:
* Duplication is not obviously present in the initial SQL
* Materializes scan + aggregate (Expected and seen)

subquery_multi.q.out:
* The CBO plan shows materialization of a semijoin but the physical plan does not have this; definitely needs further investigation.

subquery*:
* Most materializations are of the form scan + aggregate usually with a IS NOT NULL filter

union_remove*:
* AS noted earlier there is an interference of the CTE materialization logic with the UNION Remove logic which operates at the physical level.
* If we go from RelNode to Operator then maybe this becomes less of a problem but as it is the plans seem less efficient.

clientnegative:
Nothing worrisome; the failing vertex changes since CTE materialization adds additional operators to the plan.

General notes:
5. Most queries with union or intersection have identical branches and in this case the CTE detection logic kicks in and generates scan+?filter+aggregate
4. Lineage shows the temporary table (e.g., union28.q.out)
3. I have to add some tests with subqueries since they exhibit implicit sharing.
1. In general there seems to be some optimizations missing in terms of UNION/INTERSECT ALL with identical branches
2. Since we are introducing a tmp table it is very likely that we are changing the data format. If the initial table is ORC we may materialize to TEXT and various other combinations which may not be performant.
The latter may not be that relevant cause all operators writting to files use a specific format:
File Output Operator
                  compressed: false
                  Statistics: Num rows: 493 Data size: 42891 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
org.apache.hadoop.hive.ql.parse.SemanticException: View definition references temporary table default@cte_suggestion_0
        at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211)
        at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)

Reproducible using subquery_views.q and view_cast.q
…rRule

java.lang.AssertionError:
Type mismatch:
rel rowtype:
RecordType(NULL int_col) NOT NULL
equivRel rowtype:
RecordType(BOOLEAN NOT NULL boolean_col, BOOLEAN NOT NULL literalTrue) NOT NULL
	at org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
	at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2193)
	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:580)
	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
	at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268)
	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283)
	at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:454)
	at org.apache.calcite.rel.rules.materialize.MaterializedViewProjectFilterRule.onMatch(MaterializedViewProjectFilterRule.java:50)
	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229)
	at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2113)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2147)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1708)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1579)
	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1331)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:580)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)

	The AssertionError can be reproduced by running subquery_null_agg.q

	The problem happens due to two things:
	* the rule matches a plan with a filter condition that is simplified to false during the rewritting
	* there is a view (cte suggestion) that is basically a trivial project on top of the table

	CTE suggestions with just project+scan do not make much sense so we can drop them by tuning the CommonRelSubExprRegisterRule and workaround the problem for now.

	Depending on the bandwidth we may want to attack the bug in the MaterializedViewProjectFilterRule and make the latter more robust; that would be the actual fix.
…E suggestion contains untyped NULLs

org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field:  int_col
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8391) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8350) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7901) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11508) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12444) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12310) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1069) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2389) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2322) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:642) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]
        at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]

The problem can be reproduced using subquery_null_agg.q when CTE suggestions are used but can also be seen for any CTAS query with untyped NULLs.
```
create table testctas1 (id int);
create table testctas3 as select 1, 2, NULL, 4 as ncol from testctas1;
```
Since this is a limitation with CTAS we have to filter out CTEs suggestions that contain untyped NULLs in the result type.
This is probably caused by the rebase on master and changes affecting the parser.
…H clause) have the same alias

The problem can be reproduced using join0.q and the full stack trace is shown below.

 org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0'
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
The problem can be reproduced by running join0.q

java.lang.UnsupportedOperationException: type not serializable: LAZY (type org.apache.calcite.rel.core.Spool.Type)
	at org.apache.calcite.rel.externalize.RelJson.toJson(RelJson.java:319)
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJson.toJson(HiveRelJson.java:46)
	at org.apache.calcite.rel.externalize.RelJsonWriter.put(RelJsonWriter.java:83)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:66)
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil.toJsonString(HiveRelOptUtil.java:1073)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:669)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
Below failures to investigate
…xpected output from union rewriting program

The cte_cbo_iobe_mv_union_rewrite file contains a repro of the problem:

 java.lang.IndexOutOfBoundsException: Index: 0
	at java.util.Collections$EmptyList.get(Collections.java:4456)
	at org.apache.calcite.rel.AbstractRelNode.getInput(AbstractRelNode.java:143)
	at org.apache.calcite.rel.rules.materialize.MaterializedViewAggregateRule.rewriteQuery(MaterializedViewAggregateRule.java:250)
	at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:374)
	at org.apache.calcite.rel.rules.materialize.MaterializedViewOnlyAggregateRule.onMatch(MaterializedViewOnlyAggregateRule.java:68)
	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229)
	at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2114)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2152)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1580)
	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1332)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:581)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)

If the input to the MV rule is a combination of Aggregate + Scan and there is a registered MV that qualifies for union rewriting then an IOBE is thrown; the result of the union rewriting program is a Scan operator that does not have any inputs.

The IOBE is triggered only during the CTE rewrite phase in cases where the HiveAggregateProjectMergeRule has fired before. In normal MV rewrite this cannot happen since HiveAggregateProjectMergeRule is applied after the MV rewrite.
…f-joins of CTE/MV/Table

When CTES/MVs are in use, and the plan contains self-joins with the same table/cte/mv the resulting AST does not have the expected shape so we endup with ambiguity when creating the AST from the RelNode.

The auto_smb_mapjoin_14.q and other tests are failing with errors similar to the one below:

 org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0'
 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167)
 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679)
 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899)
 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113)
 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754)
 at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636)
 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
 at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
 at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
 at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
 at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
Copy link

sonarcloud bot commented Apr 18, 2024

Quality Gate Passed Quality Gate passed

Issues
46 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Jun 18, 2024
@github-actions github-actions bot closed this Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants