explain: support `EXPLAIN CREATE MATERIALIZED VIEW` #21973

aalexandrov · 2023-09-26T09:41:16Z

Add support for

EXPLAIN CREATE MATERIALIZED VIEW $name AS $query

syntax.

Motivation

This PR adds a known-desirable feature.

Part of MaterializeInc/database-issues#5301.

Tips for reviewer

See individual commit messages for details.
For test coverage I cloned tpch.slt and changed the workload to use CREATE MATERIALIZED VIEW $qXX AS $query instead of $query. Besides the different rendering and the disappearing Finishing lines, there were no other changes (you can diff the two tpch_*.slt files to see what changed).

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

jkosh44 · 2023-09-26T17:14:01Z

src/adapter/src/catalog.rs

+                let raw_expr = view.expr;
+                let decorrelated_expr = raw_expr.optimize_and_lower(&plan::OptimizerConfig {})?;
+                let optimized_expr = optimizer.optimize(decorrelated_expr)?;


Is it worth extracting this into a method? It seems like we do it a couple of times.

This will be done in #20569.

src/sql/src/plan.rs

jkosh44 · 2023-09-26T17:20:38Z

src/sql/src/plan/statement/dml.rs

+            // If we don't force this parameter to Skip planning fails for names
+            // that already exist in the catalog.
+            stmt.if_exists = IfExistsBehavior::Skip;


So what happens if someone types EXPLAIN CREATE MATERIALIZE VIEW IF NOT EXISTS mv ... and an mv already exists? Do we just ignore the IF NOT EXISTS? It seems like we should return an error or some output to indicate that the item already exists.

It seems like we should return an error or some output to indicate that the item already exists.

I was optimizing for the following workflow:

Write a SQL query $MV[1].

Run EXPLAIN CREATE MATERIALIZED VIEW mv AS $MV[1].

Refine the definition until the EXPLAIN plan looks good.

Run CREATE MATERIALIZED VIEW mv AS $MV[i] once satisfied with the EXPLAIN output of version i.

Some time passes.

We decide that we want to refine the mv definition to $MV[i+1].

Run CREATE MATERIALIZED VIEW mv AS $MV[i+1].

If I remove this line, we will get the behavior that you expect, but then step (7) will fail because mv already exists. Even though the command that is executed in practice will be CREATE OR REPLACE MATERIALIZED VIEW, I'm not sure if we should ask for people to think about that detail when they just want to run EXPLAIN.

However, if you think this is more consistent I will change this before merging.

(Also note that currently EXPLAIN CREATE OR REPLACE MATERIALIZED VIEW is not accepted by the parser, because of the if self.peek_keywords(&[CREATE, MATERIALIZED, VIEW]).)

In that case I think we should reject EXPLAIN CREATE MATERIALIZED VIEW IF NOT EXISTS in the parser as well, if it's not already rejected.

My worry is the following:

Someone runs EXPLAIN CREATE MATERIALIZED VIEW IF NOT EXISTS mv AS $MV[i].

They are happy with the plan.

They run CREATE MATERIALIZED VIEW IF NOT EXISTS mv AS $MV[i] expecting the plan they just saw to be installed.

The materialized view already existed so nothing happens.

I think my point is that it would be better to reject syntax instead of ignore it, because it will lead to less confusion.

The materialized view already existed so nothing happens

I see, so the notice that we would emit in this case is not sufficient. Then how about:

I will use the parse_create_materialized_view, accepting the full syntax.

If the if_exists is set to IfExistsBehavior::Skip at the syntax level, I will emit a parser error indicating that the syntax is not supported because EXPLAIN always assumes that the view does not exist.

If the if_exists is set to anything else, I will override it to IfExistsBehavior::Skip in the parse_explain_plan function so plan_explain_plan will not need to modify it explicitly and we don't reject EXPLAIN for a different CREATE of an already existing view (which seems to be counter-productive behavior).

What could happen with the proposal above is

A user runs EXPLAIN CREATE MATERIALIZED VIEW mv AS $MV[i].

They are happy with the plan.

They run CREATE MATERIALIZED VIEW mv AS $MV[i] expecting the plan they just saw to be installed.

The above statement is rejected because the plan already exists, so the user needs to change the syntax to EXPLAIN OR REPLACE and hope that the plan has no dependencies.

I think that this behavior is good because it would allow me to write (internal) console tooling that:

gets the SHOW CREATE output for existing materialized views,

constructs an EXPLAIN based on the above,

compares the EXPLAIN output of the above with the EXPLAIN MATERIALIZED VIEW to explore how a plan would change (for example if I were to create an additional index).

see, so the notice that we would emit in this case is not sufficient.

I forgot about the notice. We'll print: NOTICE: materialized view "mv" already exists, skipping.

I think the proposal is good, but I'll leave it up to you if you think the notice is sufficient.

I'll go ahead with the proposal from this comment, then we don't have to rely on the use seeing the notice.

src/adapter/src/coord/sequencer/inner.rs

ggevay

Looks great, I only have some minor comments.

src/adapter/src/coord/sequencer/inner.rs

src/sql/src/plan.rs

src/sql-parser/src/parser.rs

src/adapter/src/coord/sequencer/inner.rs

ggevay · 2023-09-26T18:00:39Z

src/adapter/src/coord/sequencer/inner.rs

+        }
+
+        // Collect the list of indexes used by the dataflow at this point
+        let used_indexes = UsedIndexes::new(


This could be factored out into a function, as there are 3 copies now.

I wanted to revisit this separately. I don't think we should be calling prune_and_annotate_dataflow_index_imports as part of the optimize_dataflow call.

Rather, given a DataflowDescription we should be able to do this ad-hoc just before rendering the corresponding plan. At the moment we do this once towards the end of the optimizer pipeline, but carry this information "sideways" and pass it around for plans that occur before and after that in the optimizer trace. This both complicates the code and produces possibly incorrect EXPLAIN output (an index might be printed even though the associated plan structure does not reveal that we would actually use it).

produces possibly incorrect EXPLAIN output (an index might be printed even though the associated plan structure does not reveal that we would actually use it).

So, if I understand correctly, this is the "freezing" problem that we discussed some time ago in a Friday meeting. (Note that optimize_dataflow_monotonic also has the same problem that if some transformation happens to the plan afterwards, then it might possibly go out of sync with the plan.) But I'm not sure if pulling these out of optimize_dataflow is the best solution, because then these functions (prune_and_annotate_dataflow_index_imports and optimize_dataflow_monotonic) would need to be called from several different places, which would also complicate the code, and would introduce the danger that some code path that calls optimize_dataflow forgets to call these. Or, if there will be a new wrapper method of optimize_dataflow that calls these functions (maybe as part of the new optimizer interface), then the problem is just shifted to that wrapper method being forbidden to do transforms afterwards. If we want a very clean solution, then we can create a FrozenMirRelationExpr (as you mentioned before), where the structure is not allowed to change anymore, just these metainfos like monotonic flag or used indexes. But anyhow, just putting a big warning comment in optimize_dataflow for now should almost certainly prevent us from making the mistake of putting some transform after these function calls.

In general prune_and_annotate_dataflow_index_imports does two things:

Annotates index imports with their usage type.

Prunes index types that are not used.

If we can split the two functions, we can just call (2) with the output of (1) in dataflow rendering but then drop the annotations.

At the moment DataflowMetainfo contains both:

fields (notices) that represent the cumulative effect of all optimizer stages traversed by the optimize_dataflow call, and 2. fields (index_usage_types) that represent a specific optimizer stage and are just memoized function results that can be always re-comupted from the DataflowDescription.

It will be easier to deal with DataflowMetainfo instances if we commit that they serve only one of these two purposes and find a different solution for the other one.

But anyhow, just putting a big warning comment in optimize_dataflow for now should almost certainly prevent us from making the mistake of putting some transform after these function calls.

I was thinking more about anomalies in the opposite order. Assume that you have a set of available indexes I and from that and the plan at optimization stage j you determine a subset of used indexes U. If for some reason in some stage i < j your plan was using indexes that are in I - U, the explain output will not render this correctly.

This is a niche problem though, and it could be that the set of used indexes can only grow monotonically as we advance through the optimization stages, so this situation cannot happen in practice.

If for some reason in some stage i < j your plan was using indexes that are in I - U, the explain output will not render this correctly.

Hmm, yes, we'll need to fix this at some point.

Rename the file so we can provision a sibling that uses CREATE MATERIALIZED VIEW statements for all workload queries.

Change the signature of some `drain_all` method parameters: - Set `humanizer: &dyn ExprHumanizer` instead of `catalog: ConnCatalog`. - Pass `config: &ExplainConfig` as a shared reference.

When explaining `CREATE <item>` statements, the `drain_all` method will need to use an `ExprHumanizer` instance that reports the explained item as created even if the item is not present in the the backing catalog state. This commit introduces an `ExprHumanizerExt` struct that can be used to report non-existing items (passed as a `BTree<GlobalId, TransientItem>`) as present to `ExprHumanizer` clients.

shepherdlybot · 2023-09-26T21:28:08Z

This PR has higher risk. Make sure to carefully review the file hotspots. In addition to having a knowledgeable reviewer, it may be useful to add observability and/or a feature flag. What's This?

Risk Score	Probability	Buggy File Hotspots
🔴 80 / 100	60%	6

Buggy File Hotspots:

File	Percentile
../src/rbac.rs	97
../src/catalog.rs	100
../sequencer/inner.rs	99
../statement/dml.rs	95
../src/parser.rs	98
../statement/ddl.rs	96

def- · 2023-09-26T21:29:18Z

test/sqllogictest/tpch_materialized_view.slt

+
+query T multiline
+-- Query 03
+EXPLAIN WITH(humanized_exprs, arity, join_impls)


Do we want to document humanized_exprs by the way? I didn't know about it before.

Do we want to document humanized_exprs by the way? I didn't know about it before.

This landed last week and has not yet made it's way to production. I have a reminder to put together an announcement tomorrow and possibly update the docs.

aalexandrov · 2023-09-26T22:27:25Z

@MaterializeInc/testing: note that after moving lowering and decorrelation from the plan_~ functions to the sequence_~ functions they end up running on a stack that has a bit more frames than before. Consequently, I had to slightly lower the limits on one of the "Product limits" checks (CaseWhen) from the default 1000 to 950. On my machine this is sufficient for the release build to pass, let's see if this is also fine with CI.

def-

From QA side I tried:

Coverage: https://buildkite.com/materialize/coverage/builds/234 (waiting for results)
Convert all CREATE MV queries to EXPLAIN: DO NOT SUBMIT: Hacky test #21987 (seems good)
Make SQLsmith support this, will land after this PR landed: Support EXPLAIN CREATE MATERIALIZED VIEW sqlsmith#2 (test run: https://buildkite.com/materialize/nightlies/builds/4383#018ad399-3a36-4da5-9549-d13bf6f6dd02)

aalexandrov · 2023-09-27T00:00:40Z

@def- should I be worried about the failing tests in the coverage run? Both the Nightly and the Test pipeline seem to be OK, and AFAICT the coverage run repeats the same tests. Looking at the "coverage" pipeline, it seems that we don't have green runs on any open PR at the moment, so I think the issue is either in main or the test definitions.

def-

The coverage failures are unrelated to this change.The coverage failures are unrelated to this change: https://github.com/MaterializeInc/materialize/issues/21599

@MaterializeInc/testing: note that after moving lowering and decorrelation from the plan_~ functions to the sequence_~ functions they end up running on a stack that has a bit more frames than before. Consequently, I had to slightly lower the limits on one of the "Product limits" checks (CaseWhen) from the default 1000 to 950. On my machine this is sufficient for the release build to pass, let's see if this is also fine with CI.

I'm wondering why we have so many problems with stack space in general in large queries, but unrelated to this PR.

Based on coverage (https://buildkite.com/materialize/coverage/builds/234) I added a datadriven test case and pushed it directly to this branch, hope that's fine.

SQLsmith test run was fine too.

aalexandrov · 2023-09-27T08:38:53Z

I'm wondering why we have so many problems with stack space in general in large queries, but unrelated to this PR.

To the best of our knowledge that's because in general stack frames code that more or less some form of generalized structural recursion over enum types seem to be larger compared to hand-rolled C/C++ code. Sadly some of these patterns are seen by auto-generated code (for example here we were doing a Clone of an HirRelationExpr which creates a deep copy recursively).

Also, the type of queries that we use to test against usually represent some degenerate behavior which is not really advantageous for recursive transformations. For example, the CASE ... WHEN query is represented internally as 1000 nested IfThenElse statements.

Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).

aalexandrov requested a review from ggevay September 26, 2023 09:41

aalexandrov requested a review from a team as a code owner September 26, 2023 09:41

aalexandrov requested review from a team and jkosh44 and removed request for a team September 26, 2023 09:41

aalexandrov force-pushed the issue_18089 branch 2 times, most recently from 6cc3844 to 4133056 Compare September 26, 2023 10:24

aalexandrov requested a review from a team as a code owner September 26, 2023 10:24

philip-stoev requested a review from def- September 26, 2023 10:26

aalexandrov force-pushed the issue_18089 branch from 4133056 to e067863 Compare September 26, 2023 10:26

philip-stoev removed the request for review from a team September 26, 2023 10:26

aalexandrov force-pushed the issue_18089 branch 2 times, most recently from 96e3cc4 to 5a00d50 Compare September 26, 2023 13:14

jkosh44 reviewed Sep 26, 2023

View reviewed changes

ggevay approved these changes Sep 26, 2023

View reviewed changes

aalexandrov added 5 commits September 26, 2023 23:57

test: rename tpch.slt to tpch_select.slt

ca042ad

Rename the file so we can provision a sibling that uses CREATE MATERIALIZED VIEW statements for all workload queries.

explain: generalize OptimizerTrace::drain_all signature

7acf16b

Change the signature of some `drain_all` method parameters: - Set `humanizer: &dyn ExprHumanizer` instead of `catalog: ConnCatalog`. - Pass `config: &ExplainConfig` as a shared reference.

sql: change MaterializedView::expr type to HirRelationExpr

85e4464

sql: change View::expr type to HirRelationExpr

d16cb16

aalexandrov force-pushed the issue_18089 branch from 5a00d50 to 5af4229 Compare September 26, 2023 20:58

def- reviewed Sep 26, 2023

View reviewed changes

aalexandrov added 2 commits September 27, 2023 00:30

sql: parse and plan EXPLAIN CREATE MATERIALIZED VIEW

4c55fba

adapter: handle EXPLAIN CREATE MATERIALIZED VIEW

1c8206d

aalexandrov force-pushed the issue_18089 branch from 5af4229 to 1c8206d Compare September 26, 2023 21:31

def- mentioned this pull request Sep 26, 2023

Support EXPLAIN CREATE MATERIALIZED VIEW MaterializeInc/sqlsmith#2

Merged

def- reviewed Sep 26, 2023

View reviewed changes

Add sql-parser tests for EXPLAIN CREATE MATERIALIZED VIEW

2b1630f

def- approved these changes Sep 27, 2023

View reviewed changes

aalexandrov merged commit 884a063 into MaterializeInc:main Sep 27, 2023

aalexandrov deleted the issue_18089 branch September 27, 2023 09:29

This was referenced Sep 27, 2023

explain: support EXPLAIN CREATE INDEX #22021

Merged

enable tokio_unstable broadly #21964

Merged

aalexandrov mentioned this pull request Oct 5, 2023

doc: update EXPLAIN PLAN docs #22195

Merged

5 tasks

This was referenced Jan 22, 2024

Move EXPLAIN SELECT optimization off the coordinator thread #24569

Merged

sql: migrate some ~Plan types from Mir~ to HirRelationExpr #24581

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explain: support `EXPLAIN CREATE MATERIALIZED VIEW` #21973

explain: support `EXPLAIN CREATE MATERIALIZED VIEW` #21973

aalexandrov commented Sep 26, 2023

jkosh44 Sep 26, 2023

aalexandrov Sep 26, 2023

jkosh44 Sep 26, 2023

aalexandrov Sep 26, 2023

ggevay Sep 27, 2023

jkosh44 Sep 27, 2023 •

edited

Loading

jkosh44 Sep 27, 2023

jkosh44 Sep 27, 2023

aalexandrov Sep 27, 2023 •

edited

Loading

aalexandrov Sep 27, 2023

jkosh44 Sep 27, 2023

aalexandrov Sep 27, 2023 •

edited

Loading

ggevay left a comment

ggevay Sep 26, 2023

aalexandrov Sep 26, 2023 •

edited

Loading

ggevay Sep 27, 2023

aalexandrov Sep 27, 2023

aalexandrov Sep 27, 2023 •

edited

Loading

ggevay Sep 27, 2023

shepherdlybot bot commented Sep 26, 2023

def- Sep 26, 2023

aalexandrov Sep 26, 2023

aalexandrov commented Sep 26, 2023

def- left a comment

aalexandrov commented Sep 27, 2023 •

edited

Loading

def- left a comment

aalexandrov commented Sep 27, 2023

explain: support EXPLAIN CREATE MATERIALIZED VIEW #21973

explain: support EXPLAIN CREATE MATERIALIZED VIEW #21973

Conversation

aalexandrov commented Sep 26, 2023

Motivation

Tips for reviewer

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkosh44 Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aalexandrov Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aalexandrov Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

ggevay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aalexandrov Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aalexandrov Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shepherdlybot bot commented Sep 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aalexandrov commented Sep 26, 2023

def- left a comment

Choose a reason for hiding this comment

aalexandrov commented Sep 27, 2023 • edited Loading

def- left a comment

Choose a reason for hiding this comment

aalexandrov commented Sep 27, 2023

explain: support `EXPLAIN CREATE MATERIALIZED VIEW` #21973

explain: support `EXPLAIN CREATE MATERIALIZED VIEW` #21973

jkosh44 Sep 27, 2023 •

edited

Loading

aalexandrov Sep 27, 2023 •

edited

Loading

aalexandrov Sep 27, 2023 •

edited

Loading

aalexandrov Sep 26, 2023 •

edited

Loading

aalexandrov Sep 27, 2023 •

edited

Loading

aalexandrov commented Sep 27, 2023 •

edited

Loading