Skip to content

fix: preserve clustering on managed Iceberg incremental models with partition_by#1496

Merged
sd-db merged 6 commits into
mainfrom
sd-db/triage/dbt-iceberg-partition-by-clustering-wipe
Jun 5, 2026
Merged

fix: preserve clustering on managed Iceberg incremental models with partition_by#1496
sd-db merged 6 commits into
mainfrom
sd-db/triage/dbt-iceberg-partition-by-clustering-wipe

Conversation

@sd-db

@sd-db sd-db commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Resolves #1495

Description

Managed Iceberg incremental models configured with partition_by silently lost all clustering after the first incremental run — no error, just degraded queries.

Root cause. Managed Iceberg normalizes PARTITION BY into liquid-clustering keys server-side, so the live table reports clusteringColumns=[business_date]. But LiquidClusteringProcessor.from_relation_config read only liquid_clustered_by (→ desired []), so the reconciler saw a phantom removal and issued ALTER TABLE … CLUSTER BY NONE. Same CLUSTER BY NONE mechanism as #805 (Delta, errored loudly); this is managed Iceberg and fails silently.

Fix. For managed Iceberg with no liquid_clustered_by, treat partition_by as the desired clustering so it matches what the server stores — no phantom diff, no wipe. Detection mirrors tblproperties.py.

Testing

  • Unit (test_liquid_clustering.py): managed-Iceberg partition_by → clustering, plus Delta/UniForm negatives.
  • Functional (TestIcebergIncrementalPartitionClustering): clustering survives the incremental MERGE, no CLUSTER BY NONE. Fails pre-fix, passes post-fix.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@sd-db sd-db requested a review from jprakash-db as a code owner June 4, 2026 03:56
sd-db added 2 commits June 4, 2026 09:26
…artition_by

Managed Iceberg stores partition_by as liquid clustering keys server-side,
so the incremental reconciler computed an empty desired clustering (it only
read liquid_clustered_by) while reading the existing clustering from the live
table, producing a phantom diff that issued ALTER TABLE ... CLUSTER BY NONE
after the first incremental MERGE and silently wiped the clustering.

Treat partition_by as the desired clustering for managed Iceberg so desired
matches what the server stores and no spurious wipe occurs.

Closes #1495
@sd-db sd-db force-pushed the sd-db/triage/dbt-iceberg-partition-by-clustering-wipe branch from 7a791e4 to 0b75124 Compare June 4, 2026 04:03
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  dbt/adapters/databricks/relation_configs
  liquid_clustering.py
Project Total  

This report was generated by python-coverage-comment-action

@sd-db

sd-db commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

/integration-test

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Integration tests dispatched for PR #1496 by @sd-db. Track progress in the Actions tab.

The create path casefolds table_format (parse_model._get), so a model written
as "Iceberg"/"ICEBERG" was resolved as Iceberg at create time but missed by
the partition_by-as-clustering detection, re-opening the CLUSTER BY NONE wipe.
Casefold before comparing so detection matches the create path.
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Integration results for PR #1496 — UC cluster ✅ success · SQL warehouse ❌ failure · All-purpose cluster ✅ success · Shard coverage ✅ success

Run details.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Integration results for PR #1496 — UC cluster ✅ success · SQL warehouse ✅ success · All-purpose cluster ✅ success · Shard coverage ✅ success

Run details.

sd-db added 2 commits June 4, 2026 22:01
…g regression

The regression test already verifies clustering survival via DESCRIBE DETAIL
on the final table state, which is the actual server-side behavior. The
negative log-string assertion checked an implementation detail of the emitted
SQL and is redundant, so replace run_dbt_and_capture with a plain run_dbt.
@sd-db sd-db merged commit d2aa910 into main Jun 5, 2026
9 checks passed
@sd-db sd-db deleted the sd-db/triage/dbt-iceberg-partition-by-clustering-wipe branch June 5, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

partition_by on managed Iceberg incremental models silently loses clustering after first incremental run

2 participants