-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explain: support EXPLAIN CREATE MATERIALIZED VIEW
#21973
Conversation
6cc3844
to
4133056
Compare
4133056
to
e067863
Compare
96e3cc4
to
5a00d50
Compare
let raw_expr = view.expr; | ||
let decorrelated_expr = raw_expr.optimize_and_lower(&plan::OptimizerConfig {})?; | ||
let optimized_expr = optimizer.optimize(decorrelated_expr)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth extracting this into a method? It seems like we do it a couple of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be done in #20569.
// If we don't force this parameter to Skip planning fails for names | ||
// that already exist in the catalog. | ||
stmt.if_exists = IfExistsBehavior::Skip; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens if someone types EXPLAIN CREATE MATERIALIZE VIEW IF NOT EXISTS mv ...
and an mv
already exists? Do we just ignore the IF NOT EXISTS
? It seems like we should return an error or some output to indicate that the item already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should return an error or some output to indicate that the item already exists.
I was optimizing for the following workflow:
- Write a SQL query
$MV[1]
. - Run
EXPLAIN CREATE MATERIALIZED VIEW mv AS $MV[1]
. - Refine the definition until the
EXPLAIN
plan looks good. - Run
CREATE MATERIALIZED VIEW mv AS $MV[i]
once satisfied with theEXPLAIN
output of versioni
. - Some time passes.
- We decide that we want to refine the
mv
definition to$MV[i+1]
. - Run
CREATE MATERIALIZED VIEW mv AS $MV[i+1]
.
If I remove this line, we will get the behavior that you expect, but then step (7) will fail because mv
already exists. Even though the command that is executed in practice will be CREATE OR REPLACE MATERIALIZED VIEW
, I'm not sure if we should ask for people to think about that detail when they just want to run EXPLAIN
.
However, if you think this is more consistent I will change this before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also note that currently EXPLAIN CREATE OR REPLACE MATERIALIZED VIEW is not accepted by the parser, because of the if self.peek_keywords(&[CREATE, MATERIALIZED, VIEW])
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think we should reject EXPLAIN CREATE MATERIALIZED VIEW IF NOT EXISTS
in the parser as well, if it's not already rejected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry is the following:
- Someone runs
EXPLAIN CREATE MATERIALIZED VIEW IF NOT EXISTS mv AS $MV[i]
. - They are happy with the plan.
- They run
CREATE MATERIALIZED VIEW IF NOT EXISTS mv AS $MV[i]
expecting the plan they just saw to be installed. - The materialized view already existed so nothing happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my point is that it would be better to reject syntax instead of ignore it, because it will lead to less confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The materialized view already existed so nothing happens
I see, so the notice that we would emit in this case is not sufficient. Then how about:
- I will use the
parse_create_materialized_view
, accepting the full syntax. - If the
if_exists
is set toIfExistsBehavior::Skip
at the syntax level, I will emit a parser error indicating that the syntax is not supported becauseEXPLAIN
always assumes that the view does not exist. - If the
if_exists
is set to anything else, I will override it toIfExistsBehavior::Skip
in theparse_explain_plan
function soplan_explain_plan
will not need to modify it explicitly and we don't rejectEXPLAIN
for a differentCREATE
of an already existing view (which seems to be counter-productive behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could happen with the proposal above is
- A user runs
EXPLAIN CREATE MATERIALIZED VIEW mv AS $MV[i]
. - They are happy with the plan.
- They run
CREATE MATERIALIZED VIEW mv AS $MV[i]
expecting the plan they just saw to be installed. - The above statement is rejected because the plan already exists, so the user needs to change the syntax to
EXPLAIN OR REPLACE
and hope that the plan has no dependencies.
I think that this behavior is good because it would allow me to write (internal) console tooling that:
- gets the
SHOW CREATE
output for existing materialized views, - constructs an
EXPLAIN
based on the above, - compares the
EXPLAIN
output of the above with theEXPLAIN MATERIALIZED VIEW
to explore how a plan would change (for example if I were to create an additional index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see, so the notice that we would emit in this case is not sufficient.
I forgot about the notice. We'll print: NOTICE: materialized view "mv" already exists, skipping
.
I think the proposal is good, but I'll leave it up to you if you think the notice is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go ahead with the proposal from this comment, then we don't have to rely on the use seeing the notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I only have some minor comments.
} | ||
|
||
// Collect the list of indexes used by the dataflow at this point | ||
let used_indexes = UsedIndexes::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be factored out into a function, as there are 3 copies now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to revisit this separately. I don't think we should be calling prune_and_annotate_dataflow_index_imports
as part of the optimize_dataflow
call.
Rather, given a DataflowDescription
we should be able to do this ad-hoc just before rendering the corresponding plan. At the moment we do this once towards the end of the optimizer pipeline, but carry this information "sideways" and pass it around for plans that occur before and after that in the optimizer trace. This both complicates the code and produces possibly incorrect EXPLAIN
output (an index might be printed even though the associated plan structure does not reveal that we would actually use it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
produces possibly incorrect EXPLAIN output (an index might be printed even though the associated plan structure does not reveal that we would actually use it).
So, if I understand correctly, this is the "freezing" problem that we discussed some time ago in a Friday meeting. (Note that optimize_dataflow_monotonic
also has the same problem that if some transformation happens to the plan afterwards, then it might possibly go out of sync with the plan.) But I'm not sure if pulling these out of optimize_dataflow
is the best solution, because then these functions (prune_and_annotate_dataflow_index_imports
and optimize_dataflow_monotonic
) would need to be called from several different places, which would also complicate the code, and would introduce the danger that some code path that calls optimize_dataflow
forgets to call these. Or, if there will be a new wrapper method of optimize_dataflow
that calls these functions (maybe as part of the new optimizer interface), then the problem is just shifted to that wrapper method being forbidden to do transforms afterwards. If we want a very clean solution, then we can create a FrozenMirRelationExpr
(as you mentioned before), where the structure is not allowed to change anymore, just these metainfos like monotonic flag or used indexes. But anyhow, just putting a big warning comment in optimize_dataflow
for now should almost certainly prevent us from making the mistake of putting some transform after these function calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general prune_and_annotate_dataflow_index_imports
does two things:
- Annotates index imports with their usage type.
- Prunes index types that are not used.
If we can split the two functions, we can just call (2) with the output of (1) in dataflow rendering but then drop the annotations.
At the moment DataflowMetainfo
contains both:
- fields (
notices
) that represent the cumulative effect of all optimizer stages traversed by theoptimize_dataflow
call, and 2. fields (index_usage_types
) that represent a specific optimizer stage and are just memoized function results that can be always re-comupted from theDataflowDescription
.
It will be easier to deal with DataflowMetainfo
instances if we commit that they serve only one of these two purposes and find a different solution for the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But anyhow, just putting a big warning comment in
optimize_dataflow
for now should almost certainly prevent us from making the mistake of putting some transform after these function calls.
I was thinking more about anomalies in the opposite order. Assume that you have a set of available indexes I
and from that and the plan at optimization stage j
you determine a subset of used indexes U
. If for some reason in some stage i < j
your plan was using indexes that are in I - U
, the explain output will not render this correctly.
This is a niche problem though, and it could be that the set of used indexes can only grow monotonically as we advance through the optimization stages, so this situation cannot happen in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If for some reason in some stage i < j your plan was using indexes that are in I - U, the explain output will not render this correctly.
Hmm, yes, we'll need to fix this at some point.
Rename the file so we can provision a sibling that uses CREATE MATERIALIZED VIEW statements for all workload queries.
Change the signature of some `drain_all` method parameters: - Set `humanizer: &dyn ExprHumanizer` instead of `catalog: ConnCatalog`. - Pass `config: &ExplainConfig` as a shared reference.
When explaining `CREATE <item>` statements, the `drain_all` method will need to use an `ExprHumanizer` instance that reports the explained item as created even if the item is not present in the the backing catalog state. This commit introduces an `ExprHumanizerExt` struct that can be used to report non-existing items (passed as a `BTree<GlobalId, TransientItem>`) as present to `ExprHumanizer` clients.
5a00d50
to
5af4229
Compare
This PR has higher risk. Make sure to carefully review the file hotspots. In addition to having a knowledgeable reviewer, it may be useful to add observability and/or a feature flag. What's This?
Buggy File Hotspots:
|
|
||
query T multiline | ||
-- Query 03 | ||
EXPLAIN WITH(humanized_exprs, arity, join_impls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to document humanized_exprs by the way? I didn't know about it before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to document humanized_exprs by the way? I didn't know about it before.
This landed last week and has not yet made it's way to production. I have a reminder to put together an announcement tomorrow and possibly update the docs.
5af4229
to
1c8206d
Compare
@MaterializeInc/testing: note that after moving lowering and decorrelation from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From QA side I tried:
- Coverage: https://buildkite.com/materialize/coverage/builds/234 (waiting for results)
- Convert all CREATE MV queries to EXPLAIN: DO NOT SUBMIT: Hacky test #21987 (seems good)
- Make SQLsmith support this, will land after this PR landed: Support EXPLAIN CREATE MATERIALIZED VIEW sqlsmith#2 (test run: https://buildkite.com/materialize/nightlies/builds/4383#018ad399-3a36-4da5-9549-d13bf6f6dd02)
@def- should I be worried about the failing tests in the coverage run? Both the Nightly and the Test pipeline seem to be OK, and AFAICT the coverage run repeats the same tests. Looking at the "coverage" pipeline, it seems that we don't have green runs on any open PR at the moment, so I think the issue is either in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coverage failures are unrelated to this change.The coverage failures are unrelated to this change: https://github.com/MaterializeInc/materialize/issues/21599
@MaterializeInc/testing: note that after moving lowering and decorrelation from the
plan_~
functions to thesequence_~
functions they end up running on a stack that has a bit more frames than before. Consequently, I had to slightly lower the limits on one of the "Product limits" checks (CaseWhen
) from the default 1000 to 950. On my machine this is sufficient for the release build to pass, let's see if this is also fine with CI.
I'm wondering why we have so many problems with stack space in general in large queries, but unrelated to this PR.
Based on coverage (https://buildkite.com/materialize/coverage/builds/234) I added a datadriven test case and pushed it directly to this branch, hope that's fine.
SQLsmith test run was fine too.
To the best of our knowledge that's because in general stack frames code that more or less some form of generalized structural recursion over Also, the type of queries that we use to test against usually represent some degenerate behavior which is not really advantageous for recursive transformations. For example, the |
Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).
Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).
Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).
Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).
Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).
Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).
Update the docs to add the newly supported syntax introduced in some recently merged PRs (MaterializeInc#21708, MaterializeInc#21973, and MaterializeInc#22021).
Add support for
EXPLAIN CREATE MATERIALIZED VIEW $name AS $query
syntax.
Motivation
Part of MaterializeInc/database-issues#5301.
Tips for reviewer
tpch.slt
and changed the workload to useCREATE MATERIALIZED VIEW $qXX AS $query
instead of$query
. Besides the different rendering and the disappearingFinishing
lines, there were no other changes (you can diff the twotpch_*.slt
files to see what changed).Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.