Common table expression (CTE) optimizations in CBO #5154

zabetak · 2024-03-22T14:53:56Z

No description provided.

…ITH clauses Change-Id: I442c179183ce66f2e9ecc5ee7a5891863beb584f

Change-Id: Id1ee6ceabb090b2622c3227dab693593f58dc1b3

Do not set materialize threshold programmatically. If the initial query has CTEs and the value is set to true then its fine to materialize those ones. Change-Id: I54c7935ae408bdd87cf6eae5ae3f96951f9799c4

…MPS/DEPT schema Change-Id: I1625953fa8bc73c3e89d9e73ebfab388021e3973

Change-Id: I2b659ba518033553c5230aa47c50245f5ae0db56

Drop some boilerplate code which anyways doesn't do much at the moment. Change-Id: Ic8b614bd97375b870772f7f88d9bb0890f557ac1

There is a bug somewhere and TableSpool is not introduced correctly. Need to revisit the patch. Change-Id: I705d71bb5e6dc63f5e1440d49f9bac7fac5f68d8

… single TableScan Change-Id: I28a7756eb6841656880084d209f10d362d812044

Change-Id: Idf099037e049ca9c145471f4f6ce71f543bab357

Haven't hit a problem yet but its good to have it in place. Change-Id: I9d36e0355e788f01c207bd71f0457b3a4dbfd7af

TODO: Need to pass the RelOptMaterialization to the HepPlanner. Change-Id: I76175a38045fd4816d3311aa9ba93a32627e520e

Change-Id: I49da052b09e03dffeaecc64070181d2c042df044

…c & Cost-based planner Apart from improving readability the refactoring fixes the problem in TablescanToSpoolRule that couldn't see the materializations. Changes were in needed with respect to trait handling to ensure that HiveRelCopier works as expected. Now the ed_cte_0_debug.q test passes returning same result as before. Change-Id: I86a3755a60b0bd8c582f969cd5574756afd590b6

…gHook and keep notes about changes mapjoin_hint.q.out: constant_prop_3.q.out: notInTest.q.out: 1. Duplication is not obviously present in the initial SQL 2. Physical plan not better than SWO but could possibly replace the latter at the CBO level. correlationoptimizer3.q.out: 1. Materialization and reuse of join result (not exploited by SWO) masking_2.q.out: masking_12.q.out: * Materializes scan + filter * The new plan looks reasonable but not sure why SWO was not kicking in the initial plan. masking_10.q.out: * Materializes scan + filter * New plan has a cartesian product which is strange dynamic_partition_pruning.q.out: 1. Materiliazes scan+aggregate but this nukes out DPP pruning (not really sure if that helpful in this case). dynamic_semijoin_reduction_2.q.out: 1. Materializes scan+aggregate and leads to a different SJ than the original one explainuser_2.q.out: explainanalyze_2.q.out: * Materializes join between two tables filter_aggr.q.out: 1. OPTIMIZED SQL is not shown probably because we don't handel the spool operator 2. Seems to cancel some optimization with UNION_ALL and identical parts (NOT GOOD) groupby_sort_1_23.q.out: groupby_sort_skew_1_23.q.out 1. Materializes scan + aggregate 2. OPTIMIZED SQL does not show intersect_all.q.out: intersect_distinct.q.out: * Materializes join over two tables * Check if there is blocked optimizations due to INTERESECT with identical parts. offset_limit_ppd_optimizer.q.out: limit_pushdown.q.out: * Materializes scan + aggregate but interferes with limit pushdown optimization as it is right now. mrr.q.out: * Materializes scan + aggregate (CTE referenced 3 times) sharedwork.q.out: * Materializes scan + very simple filter (IS NOT NULL) * In such simple cases the materialization is probably useless. Do we gain anything from this? * Moreover the materilaized filter seems to remain in the final plan. sharedworkext.q.out: vectorized_multi_output_select.q.out: * Materializes a join (the case here is very similar to the main e2e test motivating this work) * The multiple reducers in the SWO plan is probably due to parallel edges problem; the temporary table materialization does not need the workaround although if we were going directly to the Operator tree we would need to do something similar. skewjoin_mapjoin7.q.out: * Materiliazes join (CTE reference twice by UNION ALL) * we have seen the same pattern multiple times smb_mapjoin_14.q.out: * Materializes scan + filter * seen this before subquery_ALL.q.out: subquery_ANY.q.out: * Duplication is not obviously present in the initial SQL * Materializes scan + aggregate (Expected and seen) subquery_multi.q.out: * The CBO plan shows materialization of a semijoin but the physical plan does not have this; definitely needs further investigation. subquery*: * Most materializations are of the form scan + aggregate usually with a IS NOT NULL filter union_remove*: * AS noted earlier there is an interference of the CTE materialization logic with the UNION Remove logic which operates at the physical level. * If we go from RelNode to Operator then maybe this becomes less of a problem but as it is the plans seem less efficient. clientnegative: Nothing worrisome; the failing vertex changes since CTE materialization adds additional operators to the plan. General notes: 5. Most queries with union or intersection have identical branches and in this case the CTE detection logic kicks in and generates scan+?filter+aggregate 4. Lineage shows the temporary table (e.g., union28.q.out) 3. I have to add some tests with subqueries since they exhibit implicit sharing. 1. In general there seems to be some optimizations missing in terms of UNION/INTERSECT ALL with identical branches 2. Since we are introducing a tmp table it is very likely that we are changing the data format. If the initial table is ORC we may materialize to TEXT and various other combinations which may not be performant. The latter may not be that relevant cause all operators writting to files use a specific format: File Output Operator compressed: false Statistics: Num rows: 493 Data size: 42891 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

org.apache.hadoop.hive.ql.parse.SemanticException: View definition references temporary table default@cte_suggestion_0 at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211) at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) Reproducible using subquery_views.q and view_cast.q

…rRule java.lang.AssertionError: Type mismatch: rel rowtype: RecordType(NULL int_col) NOT NULL equivRel rowtype: RecordType(BOOLEAN NOT NULL boolean_col, BOOLEAN NOT NULL literalTrue) NOT NULL at org.apache.calcite.util.Litmus$1.fail(Litmus.java:31) at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2193) at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:580) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604) at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283) at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:454) at org.apache.calcite.rel.rules.materialize.MaterializedViewProjectFilterRule.onMatch(MaterializedViewProjectFilterRule.java:50) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229) at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2113) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1708) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1579) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1331) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:580) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) The AssertionError can be reproduced by running subquery_null_agg.q The problem happens due to two things: * the rule matches a plan with a filter condition that is simplified to false during the rewritting * there is a view (cte suggestion) that is basically a trivial project on top of the table CTE suggestions with just project+scan do not make much sense so we can drop them by tuning the CommonRelSubExprRegisterRule and workaround the problem for now. Depending on the bandwidth we may want to attack the bug in the MaterializedViewProjectFilterRule and make the latter more robust; that would be the actual fix.

…E suggestion contains untyped NULLs org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: int_col at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8391) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8350) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7901) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11508) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12444) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12310) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1069) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2389) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2322) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:642) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) ~[hive-cli-4.1.0-SNAPSHOT.jar:?] The problem can be reproduced using subquery_null_agg.q when CTE suggestions are used but can also be seen for any CTAS query with untyped NULLs. ``` create table testctas1 (id int); create table testctas3 as select 1, 2, NULL, 4 as ncol from testctas1; ``` Since this is a limitation with CTAS we have to filter out CTEs suggestions that contain untyped NULLs in the result type.

This is probably caused by the rebase on master and changes affecting the parser.

…H clause) have the same alias The problem can be reproduced using join0.q and the full stack trace is shown below. org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)

The problem can be reproduced by running join0.q java.lang.UnsupportedOperationException: type not serializable: LAZY (type org.apache.calcite.rel.core.Spool.Type) at org.apache.calcite.rel.externalize.RelJson.toJson(RelJson.java:319) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJson.toJson(HiveRelJson.java:46) at org.apache.calcite.rel.externalize.RelJsonWriter.put(RelJsonWriter.java:83) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:66) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91) at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59) at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil.toJsonString(HiveRelOptUtil.java:1073) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:669) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)

…ion during AST conversion

Below failures to investigate

…xpected output from union rewriting program The cte_cbo_iobe_mv_union_rewrite file contains a repro of the problem: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4456) at org.apache.calcite.rel.AbstractRelNode.getInput(AbstractRelNode.java:143) at org.apache.calcite.rel.rules.materialize.MaterializedViewAggregateRule.rewriteQuery(MaterializedViewAggregateRule.java:250) at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:374) at org.apache.calcite.rel.rules.materialize.MaterializedViewOnlyAggregateRule.onMatch(MaterializedViewOnlyAggregateRule.java:68) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229) at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2114) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2152) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1580) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1332) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:581) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) If the input to the MV rule is a combination of Aggregate + Scan and there is a registered MV that qualifies for union rewriting then an IOBE is thrown; the result of the union rewriting program is a Scan operator that does not have any inputs. The IOBE is triggered only during the CTE rewrite phase in cases where the HiveAggregateProjectMergeRule has fired before. In normal MV rewrite this cannot happen since HiveAggregateProjectMergeRule is applied after the MV rewrite.

…f-joins of CTE/MV/Table When CTES/MVs are in use, and the plan contains self-joins with the same table/cte/mv the resulting AST does not have the expected shape so we endup with ambiguity when creating the AST from the RelNode. The auto_smb_mapjoin_14.q and other tests are failing with errors similar to the one below: org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)

sonarcloud · 2024-04-18T19:41:42Z

Quality Gate passed

Issues
46 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

github-actions · 2024-06-18T00:21:34Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

asf-ci-hive added tests pending tests passed and removed tests pending labels Mar 22, 2024

zabetak force-pushed the cte-cbo branch from 6daf34d to 7bfc4a6 Compare March 25, 2024 15:27

asf-ci-hive added tests pending tests failed and removed tests passed tests pending labels Mar 25, 2024

zabetak force-pushed the cte-cbo branch from 7bfc4a6 to 3344c2f Compare April 3, 2024 14:54

asf-ci-hive added tests pending tests unstable and removed tests failed tests pending tests unstable labels Apr 3, 2024

Stamatis Zampetakis added 13 commits April 5, 2024 14:10

Prototype CteRewriteRule and implement conversion back to AST using W…

3e9a908

…ITH clauses Change-Id: I442c179183ce66f2e9ecc5ee7a5891863beb584f

Copy TableFunctionScan to new cluster

9337785

Change-Id: Id1ee6ceabb090b2622c3227dab693593f58dc1b3

Add configuration property controlling CTE rewrite via CBO

9a2fb94

Do not set materialize threshold programmatically. If the initial query has CTEs and the value is set to true then its fine to materialize those ones. Change-Id: I54c7935ae408bdd87cf6eae5ae3f96951f9799c4

Add first self-contained end-to-end test using CTE rewrite based on E…

e59a382

…MPS/DEPT schema Change-Id: I1625953fa8bc73c3e89d9e73ebfab388021e3973

Use locally built workload-insights dependency (1.0.1.2024.0.18.0-12)

ac74eac

Change-Id: I2b659ba518033553c5230aa47c50245f5ae0db56

Replace TemporaryRelOptTable with RelOptTableImpl

07ec2b7

Drop some boilerplate code which anyways doesn't do much at the moment. Change-Id: Ic8b614bd97375b870772f7f88d9bb0890f557ac1

Refactor to separate package and extract rules (Buggy)

1005049

There is a bug somewhere and TableSpool is not introduced correctly. Need to revisit the patch. Change-Id: I705d71bb5e6dc63f5e1440d49f9bac7fac5f68d8

Disable dag mode for HepPlanner to avoid introducing spools for every…

254998b

… single TableScan Change-Id: I28a7756eb6841656880084d209f10d362d812044

Small refactoring for RelCteTransformer

ad527ea

Change-Id: Idf099037e049ca9c145471f4f6ce71f543bab357

Use general copy for arbitrary nodes in HiveRelCopier

914c154

Haven't hit a problem yet but its good to have it in place. Change-Id: I9d36e0355e788f01c207bd71f0457b3a4dbfd7af

Use Hive cost model for doing the CTE/MV rewrite + general refactoring

3ad26bc

TODO: Need to pass the RelOptMaterialization to the HepPlanner. Change-Id: I76175a38045fd4816d3311aa9ba93a32627e520e

Add .q file for facilitate debugging on a single query

413d833

Change-Id: I49da052b09e03dffeaecc64070181d2c042df044

zabetak added 12 commits April 5, 2024 14:11

Update internal_interval in query32,92 outputs

9107806

This is probably caused by the rebase on master and changes affecting the parser.

Update q.out files after fixing Spool serialization and alias generat…

cab67ca

…ion during AST conversion

Move CTE rewriting logic at the end of CBO optimizations

b39878a

Run subquery qtests and update plans

b8fbd71

Below failures to investigate

Update cte_cbo_rewrite_0.q.out after moving CTE rewriting logic

e39bd25

Update TPC-DS plans after moving CTE rewriting logic to the end

23e9f4f

zabetak force-pushed the cte-cbo branch from af27c65 to 23e9f4f Compare April 9, 2024 10:16

asf-ci-hive added tests pending and removed tests unstable labels Apr 9, 2024

zabetak added 2 commits April 9, 2024 13:24

Run some qtests and update the plans

93e5f50

Run some qtests and update the plans

8d8a2be

asf-ci-hive added tests failed and removed tests pending labels Apr 9, 2024

zabetak added 3 commits April 17, 2024 13:19

Run some qtests and update the plans

acae14f

asf-ci-hive added tests pending and removed tests failed labels Apr 18, 2024

asf-ci-hive added tests failed and removed tests pending labels Apr 18, 2024

github-actions bot added the stale label Jun 18, 2024

github-actions bot closed this Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common table expression (CTE) optimizations in CBO #5154

Common table expression (CTE) optimizations in CBO #5154

zabetak commented Mar 22, 2024

sonarcloud bot commented Apr 18, 2024

github-actions bot commented Jun 18, 2024

Common table expression (CTE) optimizations in CBO #5154

Common table expression (CTE) optimizations in CBO #5154

Conversation

zabetak commented Mar 22, 2024

sonarcloud bot commented Apr 18, 2024

Quality Gate passed

github-actions bot commented Jun 18, 2024