Skip to content

feat(transaction): Add TransactionAction POC #1400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

CTTY
Copy link
Contributor

@CTTY CTTY commented Jun 1, 2025

@CTTY CTTY force-pushed the ctty/tx-commit branch from 87ad078 to d481d69 Compare June 1, 2025 05:24
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions.

@CTTY
Copy link
Contributor Author

CTTY commented Jun 5, 2025

Hi @liurenjie1024 @ZENOTME ,

Thanks for the feedback! I took another look at the transaction code path implementation in Java today and had some additional thoughts.

  • PendingUpdate#apply:

    • Responsible for building the item T, if it hasn't been built yet.

    • (In the case of SnapshotProducer) It validates table history.

    • Generates updated manifest lists and writes them to disk.

    • Returns the generic item T held by the PendingUpdate.

  • PendingUpdate#commit:

    • Retrieves T, which may come from apply() or be built during commit().

    • Builds new metadata based on the base table, and applies T to generate staging metadata.

    • (For SnapshotProducer) Asks the catalog to commit the updated metadata — i.e., flush it to disk and update the catalog pointer to the new metadata.

Mapping to Rust's TransactionAction:

  • TransactionAction::apply() should:

    • Build the item T, if needed.

    • Run SnapshotProduceAction::apply() to validate history and write updated manifest lists.

    • Return the intermediate result T.

  • TransactionAction::commit() should:

    • Retrieve T.

    • Generate staging metadata by updating Transaction.current_table, Transaction.updates, and Transaction.requirements.

    • If needed, call SnapshotProduceAction::commit() → Catalog::update_table().

I've updated my code to add Transaction::apply to be more aligned with the Java semantics. Please take a look! The current changes are still based on UpdateLocationAction, which is relatively simple. I'm trying to make the API usages as close to the Java version as possible (like this)
Next, I plan to apply this structure to more complex actions like fast append. Please let me know your thoughts!

@liurenjie1024
Copy link
Contributor

Hi, @CTTY I think we should seperate our concerns here. Our design could be inspired by java, but it doesn't mean we should blindly copy java's class hierarchy. There are a lot of differences between these two languages. The T apply() method in java is quite convenient since java's language elements facilities such complex class hierarchies, but this is not the case for rust.

I don't think it's a good idea to add an apply method in the general purpose TransactionAction. TransactionAction is used to be stored in Transaction so that we could do retry when commit exception, so commit is the only method required. In future we may have some type safe trait for actions which may produce new snapshot, but this is not the concern in this pr.

@ZENOTME
Copy link
Contributor

ZENOTME commented Jun 5, 2025

Hi, @CTTY I think we should seperate our concerns here. Our design could be inspired by java, but it doesn't mean we should blindly copy java's class hierarchy. There are a lot of differences between these two languages. The T apply() method in java is quite convenient since java's language elements facilities such complex class hierarchies, but this is not the case for rust.

I don't think it's a good idea to add an apply method in the general purpose TransactionAction. TransactionAction is used to be stored in Transaction so that we could do retry when commit exception, so commit is the only method required. In future we may have some type safe trait for actions which may produce new snapshot, but this is not the concern in this pr.

+1 for we don't need to apply now. In iceberg-java, the apply can be used for code reuse. E.g. SnapshotProducer implements apply and commit, and CherryPickOperation inherits from SnapshotProducer, overwrites the apply but reuses the commit. But we can't do this in Rust.

…t location. removed updates and reqs from tx. Need to push fast append and sort order to tx.actions.
@CTTY CTTY changed the title feat(WIP): Add TransactionAction feat(transaction): Add TransactionAction POC Jun 7, 2025
@CTTY
Copy link
Contributor Author

CTTY commented Jun 7, 2025

Discussed with @liurenjie1024 offline, and we are aligned on the general approach demonstrated here to refactor the transaction commit path and enable retry

Next, I'll break this PR into smaller ones and incrementally merge the code~

.context(tx)
.sleep(tokio::time::sleep)
.when(|e| {
if let Some(err) = e.downcast_ref::<Error>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use downcast_ref because do_commit returns an anyhow::Error. If we update the API for do_commit to return an iceberg::Error instead, I believe it would make things clearer.

liurenjie1024 added a commit that referenced this pull request Jun 10, 2025
## Which issue does this PR close?


This is a part of the effort to refactor transaction commit path and
enable retry for write operations.
Please find the POC here:
#1400

Related Issues:
- #1382 [EPIC]
- #1386 
- #1387 
- #1388 
- #1389

## What changes are included in this PR?
- Add `TransactionAction`, `ActionCommit`, and `ApplyTransactionAction`
- Add `actions` field in `Transaction`


## Are these changes tested?
Added unit tests

---------

Co-authored-by: Renjie Liu <[email protected]>
liurenjie1024 added a commit that referenced this pull request Jun 10, 2025
## Which issue does this PR close?


This is a part of the effort to refactor transaction commit path and
enable retry for write operations.
Please find the POC here:
#1400

Related Issues:
- #1382 [EPIC]
- #1386 
- #1387 
- #1388 
- #1389

## What changes are included in this PR?
- Make Transaction own base_table instead of a reference, so we can
refresh base_table using catalog within the Transaction when retry


## Are these changes tested?
Using the existing unit tests


---------

Co-authored-by: Renjie Liu <[email protected]>
liurenjie1024 pushed a commit that referenced this pull request Jun 18, 2025
…1448)

## Which issue does this PR close?

Related Issues:
- #1382 [EPIC]

## What changes are included in this PR?

- Implement `TransactionAction` for `FastAppendAction`
- Changed `SnapshotProduceAction` to `SnapshotProducer`
- The idea is to make `SnapshotProduceAction` an _stateless_ helper
class that helps with fast_append::commit (as discussed
[here](#1400 (comment)))

## Are these changes tested?
Added unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants