jsonb_unpack TableFunc, introduced by the optimizer#36957
Conversation
05b8c9a to
1cae3b7
Compare
mgree
left a comment
There was a problem hiding this comment.
This is looking pretty good! I like the scenario test, but maybe we can do some more targeted random testing, too?
We should talk a bit about some alternatives---I'm particularly curious about taking smaller bites and letting other passes clean up. For code/impl strategy: maybe get one more opinion (@antiguru and/or @frankmcsherry)?
| JsonbObjectKeys, | ||
| JsonbArrayElements, | ||
| JsonbArrayElementsStringify, | ||
| /// Extracts multiple fields from a single jsonb value in one pass, with |
There was a problem hiding this comment.
Does this comment belong here, or on the function that implements it?
| /// are stored sorted, enabling a sorted-merge lookup). | ||
| /// | ||
| /// Internal only: introduced by the `JsonbUnpack` transform in | ||
| /// `mz-transform`; not reachable from SQL and has no catalog entry. |
There was a problem hiding this comment.
As discussed, I think we should make it reachable by SQL---a nice escape hatch in case the transform is too fussy.
| Hash, | ||
| MzReflect | ||
| )] | ||
| pub enum JsonbUnpackFieldKind { |
There was a problem hiding this comment.
Is it not possible to have a path of indices for arrays nested in objects? Is possibly to have a mix of Key and Index in one unpack? I worry that there are some unrepresentable/doomed-to-fail states here, like vec![Key("foo"), Index(0)] (unless Index turns into fiedl access of a "0" field).
| for (idx, field) in fields.iter().enumerate() { | ||
| match &field.kind { | ||
| Key(k) => wanted.push((k.as_str(), idx)), | ||
| Index(_) => {} |
There was a problem hiding this comment.
Is silent drop the right behavior? What would an indexed get an object normally do? Push the NULL like we see below?
| .unwrap_or(Datum::Null), | ||
| Some(v) => v, | ||
| None => Datum::Null, |
There was a problem hiding this comment.
Confirming the Null is right, as opposed to a JsonNull!
| // passes through the input columns and the demanded expressions. | ||
| // `optimize` inlines single-use expressions (e.g. predicate support back | ||
| // into the predicates) and drops anything unused. | ||
| let lower_slots: Vec<usize> = (0..n) |
There was a problem hiding this comment.
The order of events here is kind of confusing. We compute placements for the lower slots, then the input, then add those placements in.
| .map(upper_exprs) | ||
| .project(mfp.projection.iter().map(|p| remap_upper(*p))); | ||
| upper_mfp.optimize(); | ||
|
|
There was a problem hiding this comment.
Add some assertion that the final arity is correct?
| fn contains_literal_err(expr: &MirScalarExpr) -> bool { | ||
| let mut found = false; | ||
| expr.visit_pre(|e| { | ||
| if e.is_literal_err() { | ||
| found = true; | ||
| } | ||
| }); | ||
| found | ||
| } |
There was a problem hiding this comment.
MSE::contains_err already does this, more efficiently (early return).
| // Replace multiple jsonb accessors on a common value with a single | ||
| // multi-field unpacking table function. Placed late in the | ||
| // pipeline because it introduces FlatMaps, which many transforms | ||
| // handle poorly, and because it is a physical optimization: it | ||
| // changes how the data is accessed, not what is computed. Nothing | ||
| // after this point re-canonicalizes MFPs or rearranges what it | ||
| // builds; the final `fold_constants_fixpoint` below only needs a | ||
| // correct `TableFunc::eval`. | ||
| Box::new(jsonb_unpack::JsonbUnpack); | ||
| if ctx.features.enable_jsonb_unpack_transform, |
There was a problem hiding this comment.
Since the FlatMaps we produce here are somehow permanent, I wonder if there was some set of smaller manipulations we could do. Right now, we wait until we're done with all of our MFP/scalar manipulation, and then we batch together everything and call it done. Alternatively, we could batch together nearby JSON accesses earlier, maybe a few times, leaving behind messy maps and filters and projects (that other transforms can clean up after).
| # Tests for the JsonbUnpack optimizer transform, which replaces multiple | ||
| # jsonb accessors on a common value with a single multi-field unpacking | ||
| # table function. |
There was a problem hiding this comment.
Would be good to have bigger queries here---like, how does it behave around joins and other features it doesn't support?
AI prototype for solving the json unpacking problem with the
TableFunc/optimizer approach.Slack where multiple people are asking for fast json unpacking again.
Nightly (subset: random queries + feature benchmark, because of adding a specific feature benchmark scenario): https://buildkite.com/materialize/nightly/builds/16757