-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common table expression (CTE) optimizations in CBO #5154
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
asf-ci-hive
added
tests pending
tests failed
and removed
tests passed
tests pending
labels
Mar 25, 2024
asf-ci-hive
added
tests pending
tests unstable
and removed
tests failed
tests pending
tests unstable
labels
Apr 3, 2024
…ITH clauses Change-Id: I442c179183ce66f2e9ecc5ee7a5891863beb584f
Change-Id: Id1ee6ceabb090b2622c3227dab693593f58dc1b3
Do not set materialize threshold programmatically. If the initial query has CTEs and the value is set to true then its fine to materialize those ones. Change-Id: I54c7935ae408bdd87cf6eae5ae3f96951f9799c4
…MPS/DEPT schema Change-Id: I1625953fa8bc73c3e89d9e73ebfab388021e3973
Change-Id: I2b659ba518033553c5230aa47c50245f5ae0db56
Drop some boilerplate code which anyways doesn't do much at the moment. Change-Id: Ic8b614bd97375b870772f7f88d9bb0890f557ac1
There is a bug somewhere and TableSpool is not introduced correctly. Need to revisit the patch. Change-Id: I705d71bb5e6dc63f5e1440d49f9bac7fac5f68d8
… single TableScan Change-Id: I28a7756eb6841656880084d209f10d362d812044
Change-Id: Idf099037e049ca9c145471f4f6ce71f543bab357
Haven't hit a problem yet but its good to have it in place. Change-Id: I9d36e0355e788f01c207bd71f0457b3a4dbfd7af
TODO: Need to pass the RelOptMaterialization to the HepPlanner. Change-Id: I76175a38045fd4816d3311aa9ba93a32627e520e
Change-Id: I49da052b09e03dffeaecc64070181d2c042df044
…c & Cost-based planner Apart from improving readability the refactoring fixes the problem in TablescanToSpoolRule that couldn't see the materializations. Changes were in needed with respect to trait handling to ensure that HiveRelCopier works as expected. Now the ed_cte_0_debug.q test passes returning same result as before. Change-Id: I86a3755a60b0bd8c582f969cd5574756afd590b6
…gHook and keep notes about changes mapjoin_hint.q.out: constant_prop_3.q.out: notInTest.q.out: 1. Duplication is not obviously present in the initial SQL 2. Physical plan not better than SWO but could possibly replace the latter at the CBO level. correlationoptimizer3.q.out: 1. Materialization and reuse of join result (not exploited by SWO) masking_2.q.out: masking_12.q.out: * Materializes scan + filter * The new plan looks reasonable but not sure why SWO was not kicking in the initial plan. masking_10.q.out: * Materializes scan + filter * New plan has a cartesian product which is strange dynamic_partition_pruning.q.out: 1. Materiliazes scan+aggregate but this nukes out DPP pruning (not really sure if that helpful in this case). dynamic_semijoin_reduction_2.q.out: 1. Materializes scan+aggregate and leads to a different SJ than the original one explainuser_2.q.out: explainanalyze_2.q.out: * Materializes join between two tables filter_aggr.q.out: 1. OPTIMIZED SQL is not shown probably because we don't handel the spool operator 2. Seems to cancel some optimization with UNION_ALL and identical parts (NOT GOOD) groupby_sort_1_23.q.out: groupby_sort_skew_1_23.q.out 1. Materializes scan + aggregate 2. OPTIMIZED SQL does not show intersect_all.q.out: intersect_distinct.q.out: * Materializes join over two tables * Check if there is blocked optimizations due to INTERESECT with identical parts. offset_limit_ppd_optimizer.q.out: limit_pushdown.q.out: * Materializes scan + aggregate but interferes with limit pushdown optimization as it is right now. mrr.q.out: * Materializes scan + aggregate (CTE referenced 3 times) sharedwork.q.out: * Materializes scan + very simple filter (IS NOT NULL) * In such simple cases the materialization is probably useless. Do we gain anything from this? * Moreover the materilaized filter seems to remain in the final plan. sharedworkext.q.out: vectorized_multi_output_select.q.out: * Materializes a join (the case here is very similar to the main e2e test motivating this work) * The multiple reducers in the SWO plan is probably due to parallel edges problem; the temporary table materialization does not need the workaround although if we were going directly to the Operator tree we would need to do something similar. skewjoin_mapjoin7.q.out: * Materiliazes join (CTE reference twice by UNION ALL) * we have seen the same pattern multiple times smb_mapjoin_14.q.out: * Materializes scan + filter * seen this before subquery_ALL.q.out: subquery_ANY.q.out: * Duplication is not obviously present in the initial SQL * Materializes scan + aggregate (Expected and seen) subquery_multi.q.out: * The CBO plan shows materialization of a semijoin but the physical plan does not have this; definitely needs further investigation. subquery*: * Most materializations are of the form scan + aggregate usually with a IS NOT NULL filter union_remove*: * AS noted earlier there is an interference of the CTE materialization logic with the UNION Remove logic which operates at the physical level. * If we go from RelNode to Operator then maybe this becomes less of a problem but as it is the plans seem less efficient. clientnegative: Nothing worrisome; the failing vertex changes since CTE materialization adds additional operators to the plan. General notes: 5. Most queries with union or intersection have identical branches and in this case the CTE detection logic kicks in and generates scan+?filter+aggregate 4. Lineage shows the temporary table (e.g., union28.q.out) 3. I have to add some tests with subqueries since they exhibit implicit sharing. 1. In general there seems to be some optimizations missing in terms of UNION/INTERSECT ALL with identical branches 2. Since we are introducing a tmp table it is very likely that we are changing the data format. If the initial table is ORC we may materialize to TEXT and various other combinations which may not be performant. The latter may not be that relevant cause all operators writting to files use a specific format: File Output Operator compressed: false Statistics: Num rows: 493 Data size: 42891 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
org.apache.hadoop.hive.ql.parse.SemanticException: View definition references temporary table default@cte_suggestion_0 at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211) at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) Reproducible using subquery_views.q and view_cast.q
…rRule java.lang.AssertionError: Type mismatch: rel rowtype: RecordType(NULL int_col) NOT NULL equivRel rowtype: RecordType(BOOLEAN NOT NULL boolean_col, BOOLEAN NOT NULL literalTrue) NOT NULL at org.apache.calcite.util.Litmus$1.fail(Litmus.java:31) at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2193) at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:580) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604) at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283) at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:454) at org.apache.calcite.rel.rules.materialize.MaterializedViewProjectFilterRule.onMatch(MaterializedViewProjectFilterRule.java:50) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229) at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2113) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1708) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1579) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1331) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:580) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) The AssertionError can be reproduced by running subquery_null_agg.q The problem happens due to two things: * the rule matches a plan with a filter condition that is simplified to false during the rewritting * there is a view (cte suggestion) that is basically a trivial project on top of the table CTE suggestions with just project+scan do not make much sense so we can drop them by tuning the CommonRelSubExprRegisterRule and workaround the problem for now. Depending on the bandwidth we may want to attack the bug in the MaterializedViewProjectFilterRule and make the latter more robust; that would be the actual fix.
…E suggestion contains untyped NULLs org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: int_col at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8391) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8350) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7901) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11508) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12444) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12310) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1069) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2389) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2322) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:642) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] The problem can be reproduced using subquery_null_agg.q when CTE suggestions are used but can also be seen for any CTAS query with untyped NULLs. ``` create table testctas1 (id int); create table testctas3 as select 1, 2, NULL, 4 as ncol from testctas1; ``` Since this is a limitation with CTAS we have to filter out CTEs suggestions that contain untyped NULLs in the result type.
This is probably caused by the rebase on master and changes affecting the parser.
…H clause) have the same alias The problem can be reproduced using join0.q and the full stack trace is shown below. org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
The problem can be reproduced by running join0.q java.lang.UnsupportedOperationException: type not serializable: LAZY (type org.apache.calcite.rel.core.Spool.Type) at org.apache.calcite.rel.externalize.RelJson.toJson(RelJson.java:319) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJson.toJson(HiveRelJson.java:46) at org.apache.calcite.rel.externalize.RelJsonWriter.put(RelJsonWriter.java:83) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:66) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil.toJsonString(HiveRelOptUtil.java:1073) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:669) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
…ion during AST conversion
Below failures to investigate
…xpected output from union rewriting program The cte_cbo_iobe_mv_union_rewrite file contains a repro of the problem: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4456) at org.apache.calcite.rel.AbstractRelNode.getInput(AbstractRelNode.java:143) at org.apache.calcite.rel.rules.materialize.MaterializedViewAggregateRule.rewriteQuery(MaterializedViewAggregateRule.java:250) at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:374) at org.apache.calcite.rel.rules.materialize.MaterializedViewOnlyAggregateRule.onMatch(MaterializedViewOnlyAggregateRule.java:68) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229) at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2114) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2152) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1580) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1332) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:581) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) If the input to the MV rule is a combination of Aggregate + Scan and there is a registered MV that qualifies for union rewriting then an IOBE is thrown; the result of the union rewriting program is a Scan operator that does not have any inputs. The IOBE is triggered only during the CTE rewrite phase in cases where the HiveAggregateProjectMergeRule has fired before. In normal MV rewrite this cannot happen since HiveAggregateProjectMergeRule is applied after the MV rewrite.
…f-joins of CTE/MV/Table When CTES/MVs are in use, and the plan contains self-joins with the same table/cte/mv the resulting AST does not have the expected shape so we endup with ambiguity when creating the AST from the RelNode. The auto_smb_mapjoin_14.q and other tests are failing with errors similar to the one below: org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
Quality Gate passedIssues Measures |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.