Skip to content

jsonb_unpack TableFunc, introduced by the optimizer#36957

Draft
ggevay wants to merge 1 commit into
MaterializeInc:mainfrom
ggevay:unpack-json-prototype
Draft

jsonb_unpack TableFunc, introduced by the optimizer#36957
ggevay wants to merge 1 commit into
MaterializeInc:mainfrom
ggevay:unpack-json-prototype

Conversation

@ggevay

@ggevay ggevay commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

AI prototype for solving the json unpacking problem with the TableFunc/optimizer approach.

Slack where multiple people are asking for fast json unpacking again.

Nightly (subset: random queries + feature benchmark, because of adding a specific feature benchmark scenario): https://buildkite.com/materialize/nightly/builds/16757

@ggevay ggevay added the A-optimization Area: query optimization and transformation label Jun 10, 2026
@ggevay ggevay force-pushed the unpack-json-prototype branch from 05b8c9a to 1cae3b7 Compare June 10, 2026 19:42

@mgree mgree left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good! I like the scenario test, but maybe we can do some more targeted random testing, too?

We should talk a bit about some alternatives---I'm particularly curious about taking smaller bites and letting other passes clean up. For code/impl strategy: maybe get one more opinion (@antiguru and/or @frankmcsherry)?

JsonbObjectKeys,
JsonbArrayElements,
JsonbArrayElementsStringify,
/// Extracts multiple fields from a single jsonb value in one pass, with

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment belong here, or on the function that implements it?

/// are stored sorted, enabling a sorted-merge lookup).
///
/// Internal only: introduced by the `JsonbUnpack` transform in
/// `mz-transform`; not reachable from SQL and has no catalog entry.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I think we should make it reachable by SQL---a nice escape hatch in case the transform is too fussy.

Hash,
MzReflect
)]
pub enum JsonbUnpackFieldKind {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to have a path of indices for arrays nested in objects? Is possibly to have a mix of Key and Index in one unpack? I worry that there are some unrepresentable/doomed-to-fail states here, like vec![Key("foo"), Index(0)] (unless Index turns into fiedl access of a "0" field).

for (idx, field) in fields.iter().enumerate() {
match &field.kind {
Key(k) => wanted.push((k.as_str(), idx)),
Index(_) => {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is silent drop the right behavior? What would an indexed get an object normally do? Push the NULL like we see below?

Comment on lines +3716 to +3718
.unwrap_or(Datum::Null),
Some(v) => v,
None => Datum::Null,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming the Null is right, as opposed to a JsonNull!

// passes through the input columns and the demanded expressions.
// `optimize` inlines single-use expressions (e.g. predicate support back
// into the predicates) and drops anything unused.
let lower_slots: Vec<usize> = (0..n)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of events here is kind of confusing. We compute placements for the lower slots, then the input, then add those placements in.

.map(upper_exprs)
.project(mfp.projection.iter().map(|p| remap_upper(*p)));
upper_mfp.optimize();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some assertion that the final arity is correct?

Comment on lines +749 to +757
fn contains_literal_err(expr: &MirScalarExpr) -> bool {
let mut found = false;
expr.visit_pre(|e| {
if e.is_literal_err() {
found = true;
}
});
found
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSE::contains_err already does this, more efficiently (early return).

Comment thread src/transform/src/lib.rs
Comment on lines +895 to +904
// Replace multiple jsonb accessors on a common value with a single
// multi-field unpacking table function. Placed late in the
// pipeline because it introduces FlatMaps, which many transforms
// handle poorly, and because it is a physical optimization: it
// changes how the data is accessed, not what is computed. Nothing
// after this point re-canonicalizes MFPs or rearranges what it
// builds; the final `fold_constants_fixpoint` below only needs a
// correct `TableFunc::eval`.
Box::new(jsonb_unpack::JsonbUnpack);
if ctx.features.enable_jsonb_unpack_transform,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the FlatMaps we produce here are somehow permanent, I wonder if there was some set of smaller manipulations we could do. Right now, we wait until we're done with all of our MFP/scalar manipulation, and then we batch together everything and call it done. Alternatively, we could batch together nearby JSON accesses earlier, maybe a few times, leaving behind messy maps and filters and projects (that other transforms can clean up after).

Comment on lines +10 to +12
# Tests for the JsonbUnpack optimizer transform, which replaces multiple
# jsonb accessors on a common value with a single multi-field unpacking
# table function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have bigger queries here---like, how does it behave around joins and other features it doesn't support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-optimization Area: query optimization and transformation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants