Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common table expression (CTE) optimizations in CBO #5154

Closed
wants to merge 100 commits into from

Commits on Apr 5, 2024

  1. Prototype CteRewriteRule and implement conversion back to AST using W…

    …ITH clauses
    
    Change-Id: I442c179183ce66f2e9ecc5ee7a5891863beb584f
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    3e9a908 View commit details
    Browse the repository at this point in the history
  2. Copy TableFunctionScan to new cluster

    Change-Id: Id1ee6ceabb090b2622c3227dab693593f58dc1b3
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    9337785 View commit details
    Browse the repository at this point in the history
  3. Add configuration property controlling CTE rewrite via CBO

    Do not set materialize threshold programmatically. If the initial query has CTEs and the value is set to true then its fine to materialize those ones.
    
    Change-Id: I54c7935ae408bdd87cf6eae5ae3f96951f9799c4
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    9a2fb94 View commit details
    Browse the repository at this point in the history
  4. Add first self-contained end-to-end test using CTE rewrite based on E…

    …MPS/DEPT schema
    
    Change-Id: I1625953fa8bc73c3e89d9e73ebfab388021e3973
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    e59a382 View commit details
    Browse the repository at this point in the history
  5. Use locally built workload-insights dependency (1.0.1.2024.0.18.0-12)

    Change-Id: I2b659ba518033553c5230aa47c50245f5ae0db56
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ac74eac View commit details
    Browse the repository at this point in the history
  6. Replace TemporaryRelOptTable with RelOptTableImpl

    Drop some boilerplate code which anyways doesn't do much at the moment.
    
    Change-Id: Ic8b614bd97375b870772f7f88d9bb0890f557ac1
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    07ec2b7 View commit details
    Browse the repository at this point in the history
  7. Refactor to separate package and extract rules (Buggy)

    There is a bug somewhere and TableSpool is not introduced correctly. Need to revisit the patch.
    
    Change-Id: I705d71bb5e6dc63f5e1440d49f9bac7fac5f68d8
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    1005049 View commit details
    Browse the repository at this point in the history
  8. Disable dag mode for HepPlanner to avoid introducing spools for every…

    … single TableScan
    
    Change-Id: I28a7756eb6841656880084d209f10d362d812044
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    254998b View commit details
    Browse the repository at this point in the history
  9. Small refactoring for RelCteTransformer

    Change-Id: Idf099037e049ca9c145471f4f6ce71f543bab357
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ad527ea View commit details
    Browse the repository at this point in the history
  10. Use general copy for arbitrary nodes in HiveRelCopier

    Haven't hit a problem yet but its good to have it in place.
    
    Change-Id: I9d36e0355e788f01c207bd71f0457b3a4dbfd7af
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    914c154 View commit details
    Browse the repository at this point in the history
  11. Use Hive cost model for doing the CTE/MV rewrite + general refactoring

    TODO: Need to pass the RelOptMaterialization to the HepPlanner.
    Change-Id: I76175a38045fd4816d3311aa9ba93a32627e520e
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    3ad26bc View commit details
    Browse the repository at this point in the history
  12. Add .q file for facilitate debugging on a single query

    Change-Id: I49da052b09e03dffeaecc64070181d2c042df044
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    413d833 View commit details
    Browse the repository at this point in the history
  13. Refactor RelCteTransformer to allow registering CTEs in both Heuristi…

    …c & Cost-based planner
    
    Apart from improving readability the refactoring fixes the problem in TablescanToSpoolRule that couldn't see the materializations.
    
    Changes were in needed with respect to trait handling to ensure that HiveRelCopier works as expected.
    
    Now the ed_cte_0_debug.q test passes returning same result as before.
    
    Change-Id: I86a3755a60b0bd8c582f969cd5574756afd590b6
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    4716ddf View commit details
    Browse the repository at this point in the history
  14. Move CTE rewriting logic in CalcitePlanner before join ordering trans…

    …formations
    
    The refactoring is desired/necessary for the following reasons:
    * Spool sub-plans take advantage of all standard optimization rules leading to more efficient plans. The Spool sub-plans are coming from potentially external tools so we have no guarantees about their shape; as it can be seen by existing plans before this change the plans below a spool operator was far from optimal (presence of cartesian products, missed filter pushdown opportunities, etc.).
    * Moving the logic inside CalcitePlanner allows to use existing code for MV rewritting, and individual rule application (using executeProgram
    * The ASTConverter along with other code relies on the fact that the plan has a certain shape, which is obtained by applying all rules. If the Spool sub-plan (or other parts of the query) do not adhere to desired structure this can lead to failures. For instance, cbo_query64.q was failing with the following stacktrace due to the unexpected structure under the Spool:
    mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver -Dqfile=cbo_query64.q -Dtest.output.overwrite
    java.lang.IndexOutOfBoundsException: Index: 92, Size: 7
            at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_261]
            at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_261]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitInputRef(ASTConverter.java:709) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitInputRef(ASTConverter.java:664) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:354) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:569) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:261) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:580) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:261) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:580) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:531) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:261) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:580) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:261) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:122) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1458) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:627) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13450) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:479) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:319) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:184) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:319) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:227) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:108) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:202) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:656) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:602) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:596) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267) [hive-cli-3.1.3000.2024.0.18.0-12.jar:?]
            at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:210) [hive-cli-3.1.3000.2024.0.18.0-12.jar:?]
            at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:136) [hive-cli-3.1.3000.2024.0.18.0-12.jar:?]
            at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:436) [hive-cli-3.1.3000.2024.0.18.0-12.jar:?]
            at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367) [hive-cli-3.1.3000.2024.0.18.0-12.jar:?]
            at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:887) [hive-it-util-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:857) [hive-it-util-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.cli.control.CorePerfCliDriver.runTest(CorePerfCliDriver.java:108) [hive-it-util-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:173) [hive-it-util-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver.testCliDriver(TestTezTPCDS30TBPerfCliDriver.java:79) [test-classes/:?]
    
    Change-Id: I231bd303611f891b1502760fefcb91dab798e916
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    1143973 View commit details
    Browse the repository at this point in the history
  15. NPE in HiveRelMdRowCount and ASTConverter when running cbo_query64

    1. Add distinctRowCount handler for Spool operator to avoid the NPE below.
    
    java.lang.NullPointerException: null
            at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.analyzeJoinForPKFK(HiveRelMdRowCount.java:312) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:101) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at GeneratedMetadataHandler_RowCount.getRowCount_$(janino2882819400139487130.java:117) ~[?:?]
            at GeneratedMetadataHandler_RowCount.getRowCount(janino2882819400139487130.java:31) ~[?:?]
            at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:212) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1882) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1756) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.pushDownFactor(LoptOptimizeJoinRule.java:1153) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addFactorToTree(LoptOptimizeJoinRule.java:937) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createOrdering(LoptOptimizeJoinRule.java:728) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.findBestOrderings(LoptOptimizeJoinRule.java:459) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.onMatch(LoptOptimizeJoinRule.java:128) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2826) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2770) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyJoinOrderingTransform(CalcitePlanner.java:2455) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1878) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1730) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.plan(CalcitePlanner.java:1389) ~[hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:600) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13450) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:486) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:319) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:184) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:319) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:227) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
            at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:108) [hive-exec-3.1.3000.2024.0.18.0-12.jar:3.1.3000.2024.0.18.0-12]
    
    2. Modify PlanModifierForASTConv to ensure that there is a Project below every Spool operator to avoid problems when converting back to AST.
    
    Change-Id: I15ead58cd12a3bd7ec2fbd3300e6b2111ce44c6c
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    f3af2d9 View commit details
    Browse the repository at this point in the history
  16. Cancel CTE transformation effects when plan doesn't contain any Spool

    If there are no Spool operators at the end of the CTE transformation then cancel any potential side effects by returning the base plan.
    
    Change-Id: Ia71084f7825e696e64ab83a0a3079c6a246ef915
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    5036602 View commit details
    Browse the repository at this point in the history
  17. Remove TODO fixed by previous commits

    Change-Id: Iea5bb407f07b9b2b8cf6b6d77baa587a27a27429
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    08d4e0d View commit details
    Browse the repository at this point in the history
  18. Remove now unused RelCteTransformer class

    Change-Id: I98410984ec33500f9fcb259442a921abf825e5a7
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    e9c6ed9 View commit details
    Browse the repository at this point in the history
  19. Move HiveRelCopier to more general package and improve Javadoc

    Change-Id: If00652c7e4c579f8997e51b5f2a522677ea6cc6b
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    1171423 View commit details
    Browse the repository at this point in the history
  20. Remove debug messages for AST

    Change-Id: Ief7cbfa05a3d16687cab29b08e46285d0d57ed35
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    24081fe View commit details
    Browse the repository at this point in the history
  21. Remove debug .q files

    Change-Id: I8a090788598dce63bb6a5347ece3a30d18d12fa5
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    6f0dc7e View commit details
    Browse the repository at this point in the history
  22. Add HiveTableSpool for consistency with other RelNodes

    The spool specialization does not bring anything new to the table but it follows the general design pattern in Hive where all operators have their Hive equivalent.
    
    Change-Id: I19812c574536acd942c522f4ed88345236c8a70b
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    fd55af9 View commit details
    Browse the repository at this point in the history
  23. Prototype explicit RelNode to Operator transformation for CTEs

    Completely untested. At the moment it just compiles and drafts the idea
    
    Change-Id: I2ce978c3d8052b8469c3ec2c8c3012f7d7e037e9
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    6c9443b View commit details
    Browse the repository at this point in the history
  24. Rename ed_cte_0 to cte_cbo_rewrite_0

    Change-Id: I4f2e567fb72c645358f3167b0b154f3811cd6a57
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    508624b View commit details
    Browse the repository at this point in the history
  25. Update cte_cbo_rewrite_0.q.out after introducing HiveTableSpool

    Change-Id: I3a0b42c3bbfbc2047f2558ce6ca73ce64e5d2438
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    d196900 View commit details
    Browse the repository at this point in the history
  26. Enhance HiveTableScan operators over CTE tables with ColumnInfo and m…

    …utable caching data structures
    
    The presence of ColumnInfo in RelOptHiveTable is important when going directly from RelNode to Operator tree in particular for creating the mapping from HiveTableScan to TableScanOperator (HiveTableScanVisitor).
    
    The caching data structures in RelOptHiveTable must be mutable cause they are enriched gradually during the optimization process. Collections.empty are immutable and using them leads to exceptions during query compilation.
    
    Change-Id: I127bdd4156e8652bda69561402c246f39909c72e
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    6fe18a2 View commit details
    Browse the repository at this point in the history
  27. Circumvent HMS stat retrieval logic for virtual CTE tables by creatin…

    …g empty column stats
    
    Change-Id: I4f3a33c033e0d68caceb4724e1f8be9278561109
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    7b41ff1 View commit details
    Browse the repository at this point in the history
  28. Missing connections between CTE producers/consumers in HiveOpConverter

    1. Use ForwardWalker (instead of DefaultGraphWalker) to connect CTE producers & consumers since we are starting the traversal from the "sink" operator. DefaultGraphWalker is appropriate only when the traversal starts from the scans operators.
    2. TableScanOperators that represent CTEs must not be part of topOps since these operators only appear temporarily in the plan.
    
    Change-Id: I852a1058eeb124507ca53b00ec9b512bb11b33b4
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    feeca2b View commit details
    Browse the repository at this point in the history
  29. Add test with CTEs and hive.cbo.returnpath.hiveop enabled

    The Operator DAG is created successfully but the plan is not executable cause it contains parallel edges.
    The plan contains two edges from Reducer 2 to Reducer 3.
    
    Change-Id: Ibac55afefe105e3096d63655af7d9e66c63a211c
    Stamatis Zampetakis authored and zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    e89d950 View commit details
    Browse the repository at this point in the history
  30. Compilation failures in hive-exec module

    1. Drop dependency to workload-insights since compile classpath is messed up with transitive hive dependencies coming from downstream fork.
    2. Create trivial interface/implementation for CTE suggestions for making the code compile (potentially it runs as well but didn't try yet).
    3. Add RelOptHiveTable.Type enumeration to be able to distinguish tables that correspond to transient CTEs and update references. Maybe there is a better way to achieve this without introducing a new attribute to an already heavy implementation. There is a TableType enumeration in the metastore module but it doesn't seem appropriate to add new fields there for something that should never reach the metastore.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    a30aa79 View commit details
    Browse the repository at this point in the history
  31. NPE when estimating rowCount for CTE table

    java.lang.NullPointerException
    	at org.apache.hadoop.hive.ql.stats.BasicStats$DataSizeEstimator.getFileSizeForPath(BasicStats.java:220)
    	at org.apache.hadoop.hive.ql.stats.BasicStats$DataSizeEstimator.apply(BasicStats.java:207)
    	at org.apache.hadoop.hive.ql.stats.BasicStats.apply(BasicStats.java:305)
    	at org.apache.hadoop.hive.ql.stats.BasicStats$Factory.build(BasicStats.java:70)
    	at org.apache.hadoop.hive.ql.stats.BasicStats$Factory.buildAll(BasicStats.java:81)
    	at org.apache.hadoop.hive.ql.stats.StatsUtils.getNumRows(StatsUtils.java:231)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getRowCount(RelOptHiveTable.java:454)
    	at org.apache.calcite.rel.core.TableScan.computeSelfCost(TableScan.java:100)
    	at org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.getNonCumulativeCost(RelMdPercentageOriginalRows.java:174)
    	at GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost_$(Unknown Source)
    	at GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost(Unknown Source)
    	at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:288)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.cost.HiveVolcanoPlanner.getCost(HiveVolcanoPlanner.java:113)
    	at org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:415)
    	at org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:398)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1268)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1227)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
    	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268)
    	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283)
    	at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:474)
    	at org.apache.calcite.rel.rules.materialize.MaterializedViewProjectJoinRule.onMatch(MaterializedViewProjectJoinRule.java:50)
    	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229)
    	at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2089)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2110)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1713)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1575)
    	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
    	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
    	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
    	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1327)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:579)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13148)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
    	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
    
    Since the table does not correspond to an actual path in FS it is normal to get a NPE when trying to find the path.
    
    Set explicitly a rowCount to avoid going down the NPE path.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    280604e View commit details
    Browse the repository at this point in the history
  32. Run cte_cbo_rewrite_0.q and update plan

    The query runs fine and passes from the CTE code path but there is no spool operator cause the trivial suggester does not find a meaningfull CTE.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    b6fdec3 View commit details
    Browse the repository at this point in the history
  33. Add suggester for join CTEs and fix problems to make cte_cbo_rewrite_…

    …0 pass
    
    1. Add CommonTableExpressionJoinSuggester with very simplistic logic for detecting join CTEs in the plan.
    2. Propagate noDag option when executing rules with HepPlanner in CalcitePlanner
    3. Use last element of qualified name in ASTConverter to create the reference to spool/CTE.
    4. Update cte_cbo_rewrite_0.q.out based on updates
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    53ed412 View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    0e2612b View commit details
    Browse the repository at this point in the history
  35. SpoolFactory fixup

    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    5d4c989 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    c181b50 View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    d156c0c View commit details
    Browse the repository at this point in the history
  38. Add new CTE suggester based on centralized registry populated during …

    …planning
    
    1. Generalize the CTE registry idea outside the Join Suggester and incorporate in planning.The global registry may be too expensive (cpu & memory) to keep always on need to revisit this option.
    2. Add utilities for stripping HepVertices and counting nodes used by the new suggester.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    e903f0a View commit details
    Browse the repository at this point in the history
  39. Invalid table alias or column reference exception in SemanticAnalyzer…

    ….genOPTree
    
    When introducing the spool the type names of the input operator match those of the table and this is guaranteed by the respective rule. However, other optimization rules may change the names. If the names are not inline the ASTConverter will create invalid named column references for those expressions over the CTE table which will lead to compilation failures similar to the one below.
    
    org.apache.hadoop.hive.ql.parse.SemanticException: Line 30:9 Invalid table alias or column reference 'i_item_id': (possible column names are: i_item_desc, i_category, i_class, i_current_price, itemrevenue, revenueratio)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13584)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13526)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13494)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13488)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:9407)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11592)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11483)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12419)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12285)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:13036)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13148)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12663)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
    	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
    	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
    	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:4
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    0037f87 View commit details
    Browse the repository at this point in the history
  40. Decouple RelOptMaterialization from CTESuggester interface

    The materialization is really coupled with Hive since we need to create a HiveTableScan and HiveRelOptTable so it doesn't make much sense to put the responsibility of creating the MV object in the Suggester.
    
    It is highly unlikely that a consumer not familiar with the internals of Hive will be able to create such objects correctly.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    6ec5f90 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    4d7e128 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    2fb6481 View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    d0c9afe View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    263c7b1 View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    c1dec82 View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    208e9ca View commit details
    Browse the repository at this point in the history
  47. Update TPCDS query plans (Unexpected changes)

    Not sure why we had changes especially in the non-cbo plans.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ed9751c View commit details
    Browse the repository at this point in the history
  48. Update TPCDS query plans (confirms that there is flakiness)

    Rerunning the same tests without any changes leads again to plan changes so there is some kind of flakiness in the CTE logic.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    684c052 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    70ff113 View commit details
    Browse the repository at this point in the history
  50. AssertionError when registering HiveIntersect to VolcanoPlanner

    java.lang.AssertionError: Relational expression rel#115688:HiveIntersect.HIVE.[].any(input#0=HiveProject#115671,input#1=HiveProject#115686,all=false) has calling-convention HIVE but does not implement the required interface 'interface org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveRelNode' of that convention
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1123)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
    	at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
    	at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
    	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268)
    	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewBoxing$HiveMaterializedViewUnboxingRule.onMatch(HiveMaterializedViewBoxing.java:206)
    	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229)
    	at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2081)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2103)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1689)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1571)
    	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
    	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
    	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
    	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1323)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:575)
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    8bcd8fb View commit details
    Browse the repository at this point in the history
  51. AssertionError when registering HiveExcept to VolcanoPlanner

    java.lang.AssertionError: Relational expression rel#274505:HiveExcept.HIVE.[].any(input#0=HiveExcept#274490,input#1=HiveProject#274503,all=false) has calling-convention HIVE but does not implement the required interface 'interface org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveRelNode' of that convention
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1123)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
    	at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
    	at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
    	at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
    	at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:265)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2080)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2103)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1689)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1571)
    	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
    	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    26bc808 View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    2963ecd View commit details
    Browse the repository at this point in the history
  53. Configuration menu
    Copy the full SHA
    dee7682 View commit details
    Browse the repository at this point in the history
  54. Update TPC-DS plans

    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    aceae7c View commit details
    Browse the repository at this point in the history
  55. Configuration menu
    Copy the full SHA
    af86494 View commit details
    Browse the repository at this point in the history
  56. Small improvements & documentation for CommonTableExpressionRegistry

    1. Ensure that we don't modify RelNode when stripping the HepVertices
    2. Use ArrayList instead of HashSet hoping for less flakiness and more stability in the plans
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    5a23913 View commit details
    Browse the repository at this point in the history
  57. Update TPC-DS plans (still flaky)

    In cbo_query9.q.out we can observe that the spool operator changed again places. Why is that?
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    8364d48 View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    21e2743 View commit details
    Browse the repository at this point in the history
  59. Use rowCount and rowSize to break ties across maximal CTEs mainly for…

    … plan stability purposes
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ad8db62 View commit details
    Browse the repository at this point in the history
  60. Configuration menu
    Copy the full SHA
    9d621ff View commit details
    Browse the repository at this point in the history
  61. Configuration menu
    Copy the full SHA
    7440f3b View commit details
    Browse the repository at this point in the history
  62. Avoid putting trivial CTEs in the registry

    1. Do not add plain table scans since having tables appearing more than once in the query is pretty common.
    2. Do not add simple Project + TableScan combinations since they are hardly ever useful.
    3. Add only CTES rooted at a Join, Aggregate, Filter, and Project since anything else will not be able to be exploited at the moment (view based rewritting limitations).
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    7d44cd3 View commit details
    Browse the repository at this point in the history
  63. Update TPC-DS plans after changes

    Not sure why query5.q.out had changes since cbo was left intact
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    785b803 View commit details
    Browse the repository at this point in the history
  64. Configuration menu
    Copy the full SHA
    38617ef View commit details
    Browse the repository at this point in the history
  65. Configuration menu
    Copy the full SHA
    14e0fb3 View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    2e2b0fc View commit details
    Browse the repository at this point in the history
  67. Configuration menu
    Copy the full SHA
    84b9dc3 View commit details
    Browse the repository at this point in the history
  68. Consider hive.optimize.cte.materialize.full.aggregate.only in CBO CTE…

    … selection
    
    1. Add new CBO metadata classes for deriving if a CTE expression is a fully aggregate query.
    2. Prune non fully aggregate CTE suggestions when the respective conf is enabled.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    75b079e View commit details
    Browse the repository at this point in the history
  69. Configuration menu
    Copy the full SHA
    3af800c View commit details
    Browse the repository at this point in the history
  70. Configuration menu
    Copy the full SHA
    196a965 View commit details
    Browse the repository at this point in the history
  71. Remove unused CTE scans from the plan

    The CTE rewriting logic may add CTE table scans to the plan but this should always be accompanied by a Spool operator. If there is no spool operator there is no way to populate the content of the CTE suggestion thus it must be removed.
    
    The TableScanToSpoolRule alone is not enough to guarantee that there will not be orphan CTEs in the plan thus we need a rule to remove (or rather expand) those CTE scans without a corresponding Spool operator.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ec1c2ce View commit details
    Browse the repository at this point in the history
  72. Table reference count must be constant for the spool rules to work co…

    …rrectly
    
    Table reference count cannot rely on planner.getRoot() since rule applications will affect the metadata and thus the rules will fire incorrectly.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    c5a8b88 View commit details
    Browse the repository at this point in the history
  73. Add suggester creating scans with disjunctive predicates

    This is tailored around queries such as TPC-DS query9. It is a quick n dirty implem that probably will not make it in the final cut but it is useful for testing and experimentation
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    3ae07ca View commit details
    Browse the repository at this point in the history
  74. Configuration menu
    Copy the full SHA
    21c0d75 View commit details
    Browse the repository at this point in the history
  75. Remove redundant cte_cbo_rewrite_1 test case

    There is nothing fancy or new in here.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ceaebca View commit details
    Browse the repository at this point in the history
  76. Configuration menu
    Copy the full SHA
    e1c79c9 View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    c915219 View commit details
    Browse the repository at this point in the history
  78. Configuration menu
    Copy the full SHA
    1287c2f View commit details
    Browse the repository at this point in the history
  79. Configuration menu
    Copy the full SHA
    07e87f9 View commit details
    Browse the repository at this point in the history
  80. Add Hook for discovering and materializing CTEs in queries without WI…

    …TH clause
    
    The Hook sets aggressively the CTE materialization properties for all queries without an explicit WITH clause. It is mostly used for testing purposes to measure the impact of the new CTE suggestion/materialization logic without relying on the explicit presence of a WITH clause (which also suffers from few bugs e.g., HIVE-24167).
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    687c1b3 View commit details
    Browse the repository at this point in the history
  81. Add unit tests for CommonTableExpressionIdentitySuggester using appro…

    …vals framework & plus module
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    0ed2f5c View commit details
    Browse the repository at this point in the history
  82. Configuration menu
    Copy the full SHA
    81b2a5d View commit details
    Browse the repository at this point in the history
  83. Configuration menu
    Copy the full SHA
    79a3e4f View commit details
    Browse the repository at this point in the history
  84. Update .q.out files after applying the CommonTableExpressionAutoTunin…

    …gHook and keep notes about changes
    
    mapjoin_hint.q.out:
    constant_prop_3.q.out:
    notInTest.q.out:
    1. Duplication is not obviously present in the initial SQL
    2. Physical plan not better than SWO but could possibly replace the latter at the CBO level.
    
    correlationoptimizer3.q.out:
    1. Materialization and reuse of join result (not exploited by SWO)
    
    masking_2.q.out:
    masking_12.q.out:
    * Materializes scan + filter
    * The new plan looks reasonable but not sure why SWO was not kicking in the initial plan.
    
    masking_10.q.out:
    * Materializes scan + filter
    * New plan has a cartesian product which is strange
    
    dynamic_partition_pruning.q.out:
    1. Materiliazes scan+aggregate but this nukes out DPP pruning (not really sure if that helpful in this case).
    
    dynamic_semijoin_reduction_2.q.out:
    1. Materializes scan+aggregate and leads to a different SJ than the original one
    
    explainuser_2.q.out:
    explainanalyze_2.q.out:
    * Materializes join between two tables
    
    filter_aggr.q.out:
    1. OPTIMIZED SQL is not shown probably because we don't handel the spool operator
    2. Seems to cancel some optimization with UNION_ALL and identical parts (NOT GOOD)
    
    groupby_sort_1_23.q.out:
    groupby_sort_skew_1_23.q.out
    1. Materializes scan + aggregate
    2. OPTIMIZED SQL does not show
    
    intersect_all.q.out:
    intersect_distinct.q.out:
    * Materializes join over two tables
    * Check if there is blocked optimizations due to INTERESECT with identical parts.
    
    offset_limit_ppd_optimizer.q.out:
    limit_pushdown.q.out:
    * Materializes scan + aggregate but interferes with limit pushdown optimization as it is right now.
    
    mrr.q.out:
    * Materializes scan + aggregate (CTE referenced 3 times)
    
    sharedwork.q.out:
    * Materializes scan + very simple filter (IS NOT NULL)
    * In such simple cases the materialization is probably useless. Do we gain anything from this?
    * Moreover the materilaized filter seems to remain in the final plan.
    
    sharedworkext.q.out:
    vectorized_multi_output_select.q.out:
    * Materializes a join (the case here is very similar to the main e2e test motivating this work)
    * The multiple reducers in the SWO plan is probably due to parallel edges problem; the temporary table materialization does not need the workaround although if we were going directly to the Operator tree we would need to do something similar.
    
    skewjoin_mapjoin7.q.out:
    * Materiliazes join (CTE reference twice by UNION ALL)
    * we have seen the same pattern multiple times
    
    smb_mapjoin_14.q.out:
    * Materializes scan + filter
    * seen this before
    
    subquery_ALL.q.out:
    subquery_ANY.q.out:
    * Duplication is not obviously present in the initial SQL
    * Materializes scan + aggregate (Expected and seen)
    
    subquery_multi.q.out:
    * The CBO plan shows materialization of a semijoin but the physical plan does not have this; definitely needs further investigation.
    
    subquery*:
    * Most materializations are of the form scan + aggregate usually with a IS NOT NULL filter
    
    union_remove*:
    * AS noted earlier there is an interference of the CTE materialization logic with the UNION Remove logic which operates at the physical level.
    * If we go from RelNode to Operator then maybe this becomes less of a problem but as it is the plans seem less efficient.
    
    clientnegative:
    Nothing worrisome; the failing vertex changes since CTE materialization adds additional operators to the plan.
    
    General notes:
    5. Most queries with union or intersection have identical branches and in this case the CTE detection logic kicks in and generates scan+?filter+aggregate
    4. Lineage shows the temporary table (e.g., union28.q.out)
    3. I have to add some tests with subqueries since they exhibit implicit sharing.
    1. In general there seems to be some optimizations missing in terms of UNION/INTERSECT ALL with identical branches
    2. Since we are introducing a tmp table it is very likely that we are changing the data format. If the initial table is ORC we may materialize to TEXT and various other combinations which may not be performant.
    The latter may not be that relevant cause all operators writting to files use a specific format:
    File Output Operator
                      compressed: false
                      Statistics: Num rows: 493 Data size: 42891 Basic stats: COMPLETE Column stats: COMPLETE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    ad2ef62 View commit details
    Browse the repository at this point in the history
  85. SemanticException: View definition references temporary table

    org.apache.hadoop.hive.ql.parse.SemanticException: View definition references temporary table default@cte_suggestion_0
            at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211)
            at org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99)
            at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
            at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
            at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
            at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
            at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
            at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
            at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
    
    Reproducible using subquery_views.q and view_cast.q
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    0d97e9e View commit details
    Browse the repository at this point in the history
  86. AssertionError: Type mismatch when using MaterializedViewProjectFilte…

    …rRule
    
    java.lang.AssertionError:
    Type mismatch:
    rel rowtype:
    RecordType(NULL int_col) NOT NULL
    equivRel rowtype:
    RecordType(BOOLEAN NOT NULL boolean_col, BOOLEAN NOT NULL literalTrue) NOT NULL
    	at org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
    	at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2193)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:580)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
    	at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
    	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:268)
    	at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:283)
    	at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:454)
    	at org.apache.calcite.rel.rules.materialize.MaterializedViewProjectFilterRule.onMatch(MaterializedViewProjectFilterRule.java:50)
    	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229)
    	at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2113)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2147)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1708)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1579)
    	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
    	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
    	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
    	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1331)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:580)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
    	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
    
    	The AssertionError can be reproduced by running subquery_null_agg.q
    
    	The problem happens due to two things:
    	* the rule matches a plan with a filter condition that is simplified to false during the rewritting
    	* there is a view (cte suggestion) that is basically a trivial project on top of the table
    
    	CTE suggestions with just project+scan do not make much sense so we can drop them by tuning the CommonRelSubExprRegisterRule and workaround the problem for now.
    
    	Depending on the bandwidth we may want to attack the bug in the MaterializedViewProjectFilterRule and make the latter more robust; that would be the actual fix.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    8967501 View commit details
    Browse the repository at this point in the history
  87. SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type when CT…

    …E suggestion contains untyped NULLs
    
    org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field:  int_col
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8391) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8350) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7901) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11508) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12444) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12310) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:645) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1069) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2389) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2337) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2500) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2322) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:642) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:473) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
            at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]
            at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]
            at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]
            at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) ~[hive-cli-4.1.0-SNAPSHOT.jar:?]
    
    The problem can be reproduced using subquery_null_agg.q when CTE suggestions are used but can also be seen for any CTAS query with untyped NULLs.
    ```
    create table testctas1 (id int);
    create table testctas3 as select 1, 2, NULL, 4 as ncol from testctas1;
    ```
    Since this is a limitation with CTAS we have to filter out CTEs suggestions that contain untyped NULLs in the result type.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    11e5b2b View commit details
    Browse the repository at this point in the history
  88. Update internal_interval in query32,92 outputs

    This is probably caused by the rebase on master and changes affecting the parser.
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    9107806 View commit details
    Browse the repository at this point in the history
  89. SemanticException: Ambiguous table alias since references to CTE (WIT…

    …H clause) have the same alias
    
    The problem can be reproduced using join0.q and the full stack trace is shown below.
    
     org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0'
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
    	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
    	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    02cd53e View commit details
    Browse the repository at this point in the history
  90. UnsupportedOperationException when serializing Spool to JSON

    The problem can be reproduced by running join0.q
    
    java.lang.UnsupportedOperationException: type not serializable: LAZY (type org.apache.calcite.rel.core.Spool.Type)
    	at org.apache.calcite.rel.externalize.RelJson.toJson(RelJson.java:319)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJson.toJson(HiveRelJson.java:46)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.put(RelJsonWriter.java:83)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:66)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
    	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
    	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
    	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
    	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explainInputs(RelJsonWriter.java:91)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:69)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelJsonImpl.explain_(HiveRelJsonImpl.java:59)
    	at org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
    	at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
    	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil.toJsonString(HiveRelOptUtil.java:1073)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:669)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
    	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
    	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    58bff69 View commit details
    Browse the repository at this point in the history
  91. Update q.out files after fixing Spool serialization and alias generat…

    …ion during AST conversion
    zabetak committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    cab67ca View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2024

  1. Configuration menu
    Copy the full SHA
    b39878a View commit details
    Browse the repository at this point in the history
  2. Run subquery qtests and update plans

    Below failures to investigate
    zabetak committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    b8fbd71 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e39bd25 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    23e9f4f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    93e5f50 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    8d8a2be View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Configuration menu
    Copy the full SHA
    acae14f View commit details
    Browse the repository at this point in the history
  2. IndexOutOfBoundsException in MaterializedViewAggregateRule due to une…

    …xpected output from union rewriting program
    
    The cte_cbo_iobe_mv_union_rewrite file contains a repro of the problem:
    
     java.lang.IndexOutOfBoundsException: Index: 0
    	at java.util.Collections$EmptyList.get(Collections.java:4456)
    	at org.apache.calcite.rel.AbstractRelNode.getInput(AbstractRelNode.java:143)
    	at org.apache.calcite.rel.rules.materialize.MaterializedViewAggregateRule.rewriteQuery(MaterializedViewAggregateRule.java:250)
    	at org.apache.calcite.rel.rules.materialize.MaterializedViewRule.perform(MaterializedViewRule.java:374)
    	at org.apache.calcite.rel.rules.materialize.MaterializedViewOnlyAggregateRule.onMatch(MaterializedViewOnlyAggregateRule.java:68)
    	at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:229)
    	at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
    	at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.rewriteUsingViews(CalcitePlanner.java:2114)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyCteRewriting(CalcitePlanner.java:2152)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1750)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1580)
    	at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
    	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
    	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
    	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1332)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:581)
    	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
    	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
    	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
    	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
    	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
    	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
    
    If the input to the MV rule is a combination of Aggregate + Scan and there is a registered MV that qualifies for union rewriting then an IOBE is thrown; the result of the union rewriting program is a Scan operator that does not have any inputs.
    
    The IOBE is triggered only during the CTE rewrite phase in cases where the HiveAggregateProjectMergeRule has fired before. In normal MV rewrite this cannot happen since HiveAggregateProjectMergeRule is applied after the MV rewrite.
    zabetak committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    476301a View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2024

  1. SemanticException: Line 0:-1 Ambiguous table alias when there are sel…

    …f-joins of CTE/MV/Table
    
    When CTES/MVs are in use, and the plan contains self-joins with the same table/cte/mv the resulting AST does not have the expected shape so we endup with ambiguity when creating the AST from the RelNode.
    
    The auto_smb_mapjoin_14.q and other tests are failing with errors similar to the one below:
    
     org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Ambiguous table alias 'cte_suggestion_0'
     at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processTable(SemanticAnalyzer.java:1167)
     at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processJoin(SemanticAnalyzer.java:1679)
     at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1899)
     at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:2113)
     at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1754)
     at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:636)
     at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13177)
     at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:474)
     at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
     at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
     at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
    zabetak committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    6450967 View commit details
    Browse the repository at this point in the history