Problem
RewriteManifestsAction (added in #2543) intentionally does not plug into SnapshotProducer. The producer path has four cross-cutting assumptions that block a clean integration today:
update_snapshot_summaries rejects Operation::Replace — crates/iceberg/src/spec/snapshot_summary.rs:337-346 returns ErrorKind::DataInvalid for any operation other than Append / Overwrite / Delete. SnapshotProducer::commit() calls update_snapshot_summaries unconditionally, so the producer cannot emit a Replace snapshot at all today.
ManifestProcess::process_manifests is synchronous — crates/iceberg/src/transaction/snapshot.rs:119-125. Rewrite-manifests must write new manifests (async I/O via ManifestWriter), then return them as the post-process output. There is no async seam.
- Summary keys are driven by
added_data_files only — crates/iceberg/src/transaction/snapshot.rs:373-379. For rewrite-manifests added_data_files is empty by definition, so no total-* carry-forward happens, and there is no hook for the rewrite-specific summary keys (manifests-created, manifests-replaced, manifests-kept, entries-processed).
manifest_file() shape mismatch — the producer's manifest_file() assumes "if added_data_files non-empty, write one manifest from those files." Rewrite-manifests produces an N-to-M rewrite of existing manifest entries; the shape is fundamentally different.
Scope
Refactor once the underlying primitives land:
- Add
Operation::Replace (and Operation::Delete) to update_snapshot_summaries' allow-list, with summary-key carry-forward semantics that mirror Java BaseRewriteManifests.
- Make
ManifestProcess::process_manifests async (or add a parallel async hook) so producers can emit manifests written via ManifestWriter.
- Decouple summary-key emission from
added_data_files so rewrite-style ops can declare their own summary additions.
- Migrate
RewriteManifestsAction onto the unified path and delete the duplicated commit logic.
Out of scope for this issue: the rewrite_if predicate, cluster_by, custom spec_id, custom staging_location, and the iceberg-datafusion SQL-procedure layer — all separately deferrable from #2543.
References
Problem
RewriteManifestsAction(added in #2543) intentionally does not plug intoSnapshotProducer. The producer path has four cross-cutting assumptions that block a clean integration today:update_snapshot_summariesrejectsOperation::Replace—crates/iceberg/src/spec/snapshot_summary.rs:337-346returnsErrorKind::DataInvalidfor any operation other thanAppend/Overwrite/Delete.SnapshotProducer::commit()callsupdate_snapshot_summariesunconditionally, so the producer cannot emit aReplacesnapshot at all today.ManifestProcess::process_manifestsis synchronous —crates/iceberg/src/transaction/snapshot.rs:119-125. Rewrite-manifests must write new manifests (async I/O viaManifestWriter), then return them as the post-process output. There is no async seam.added_data_filesonly —crates/iceberg/src/transaction/snapshot.rs:373-379. For rewrite-manifestsadded_data_filesis empty by definition, so nototal-*carry-forward happens, and there is no hook for the rewrite-specific summary keys (manifests-created,manifests-replaced,manifests-kept,entries-processed).manifest_file()shape mismatch — the producer'smanifest_file()assumes "ifadded_data_filesnon-empty, write one manifest from those files." Rewrite-manifests produces an N-to-M rewrite of existing manifest entries; the shape is fundamentally different.Scope
Refactor once the underlying primitives land:
Operation::Replace(andOperation::Delete) toupdate_snapshot_summaries' allow-list, with summary-key carry-forward semantics that mirror JavaBaseRewriteManifests.ManifestProcess::process_manifestsasync (or add a parallel async hook) so producers can emit manifests written viaManifestWriter.added_data_filesso rewrite-style ops can declare their own summary additions.RewriteManifestsActiononto the unified path and delete the duplicated commit logic.Out of scope for this issue: the
rewrite_ifpredicate,cluster_by, customspec_id, customstaging_location, and theiceberg-datafusionSQL-procedure layer — all separately deferrable from #2543.References
apache/icebergBaseRewriteManifests+SnapshotProducer