Skip to content

[core] Support nested-key-null-strategy in FieldNestedUpdateAgg operator#8374

Open
PyRSA wants to merge 6 commits into
apache:masterfrom
PyRSA:feature/nested-update-nested-key-null-strategy
Open

[core] Support nested-key-null-strategy in FieldNestedUpdateAgg operator#8374
PyRSA wants to merge 6 commits into
apache:masterfrom
PyRSA:feature/nested-update-nested-key-null-strategy

Conversation

@PyRSA

@PyRSA PyRSA commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Purpose

[Feature] Add fields.{fieldName}.nested-key-null-strategy configuration for nested_update function
[Feature] Support nested-key null handling strategies in FieldNestedUpdateAgg operator to control behavior when nested-key does not satisfy primary key semantics

Expected behavior when nested-key contains null values or does not satisfy primary key semantics:

In the following examples, nested-key is defined as k0,k1.

case Input A Input B Current Behavior Expected Behavior
Default behavior (no config) [(0, null, "A", 1)] [(0, 1, "B", 2)] Merge Merge
MERGE strategy [(0, null, "A", 1)] [(0, 1, "B", 2)] Merge Merge
IGNORE strategy [(0, null, "A", 1)] [(0, 1, "B", 2)] Merge ❌ Ignore invalid nested-key row
ERROR strategy [(0, null, "A", 1)] [(0, 1, "B", 2)] Merge ❌ Throw exception

Behavior Definition

  • merge
    Merge rows even if nested-key does not satisfy primary key semantics.
    This is equivalent to not configuring nested-key-null-strategy.

  • ignore
    Ignore rows whose nested-key does not satisfy primary key semantics.

  • error
    Throw an exception when nested-key does not satisfy primary key semantics.


Tests

  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWithNestedKeyNullStrategyArgumentCheck
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWithoutNestedKeyNullStrategy
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWhenNestedKeyNullUseMergeStrategy
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWhenNestedKeyNullUseIgnoreStrategy
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWhenNestedKeyNullUseThrowErrorStrategy
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWithCountLimitWhenNestedKeyNullUseMergeStrategy
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWithCountLimitWhenNestedKeyNullUseIgnoreStrategy
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggWithCountLimitWhenNestedKeyNullUseThrowErrorStrategy
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggRetractAppliesNestedKeyNullStrategyToAccumulator
  • org.apache.paimon.mergetree.compact.aggregate.FieldAggregatorTest#testFieldNestedUpdateAggRetractAppliesNestedKeyNullStrategyToRetractInput

API and Format

No Changes.


Documentation

docs/docs/primary-key-table/merge-engine/aggregation.mdx

AuroraVoyage added 4 commits June 28, 2026 15:05
…-null-strategy' into feature/nested-update-nested-key-null-strategy
return options.get(
key(FIELDS_PREFIX + "." + fieldName + "." + NESTED_KEY_NULL_STRATEGY)
.enumType(NestedKeyNullStrategy.class)
.noDefaultValue());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create a default value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I originally avoided a default value because I wanted to distinguish between an unspecified option and an explicitly configured MERGE, so that I could validate that nested-key-null-strategy is only configured when nested-key is present. With a default value, these two cases become indistinguishable. The default behavior is still preserved by falling back to MERGE in FieldNestedUpdateAgg.
I'm happy to adjust this if you think consistency with other options is more important.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create a default value?

Thanks for the suggestion! I think using a default value also makes sense.

The only issue is that once MERGE becomes the default, FieldNestedUpdateAgg can no longer distinguish between an unspecified option and an explicitly configured MERGE in its constructor, so the dependency validation cannot be performed there anymore.

Instead, I'm thinking of moving the validation to FieldNestedUpdateAggFactory.create(), where the dependency can be validated before creating the aggregator by checking whether nested-key-null-strategy was explicitly configured. This still provides fail-fast validation during table creation and avoids allowing table options that have no effect.

If this approach sounds reasonable, I'll update the implementation accordingly.

boolean strategyConfigured =
        options.toConfiguration()
                .containsKey(...);

checkArgument(
        !strategyConfigured || !nestedKey.isEmpty(),
        ...);

Something along these lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants