Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions crates/storage/opendal/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ use iceberg::io::{
};
use iceberg::{Error, ErrorKind, Result};
use opendal::Operator;
use opendal::layers::RetryLayer;
use opendal::layers::{RetryLayer, TimeoutLayer};
use serde::{Deserialize, Serialize};
use utils::from_opendal_error;

Expand Down Expand Up @@ -326,9 +326,17 @@ impl OpenDalStorage {
}
};

// Transient errors are common for object stores; however there's no
// harm in retrying temporary failures for other storage backends as well.
let operator = operator.layer(RetryLayer::new());
// Apply observability/resilience layers. TimeoutLayer must be
// inside RetryLayer so each retry attempt is independently
// bounded — without a per-attempt timeout, a future parked on a
// silently dropped TCP connection never produces an `Err` and
// RetryLayer cannot retry, leaving the caller hung indefinitely.
// See: https://opendal.apache.org/docs/rust/opendal/layers/struct.TimeoutLayer.html
//
// Transient errors are common for object stores; we retry temporary
// failures with exponential backoff. The retry behavior also
// benefits non-object-store backends.
let operator = operator.layer(TimeoutLayer::new()).layer(RetryLayer::new());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original feature request linked to a Table with both min and max timeouts, how does this get encoded here? https://iceberg.apache.org/docs/latest/configuration/#table-behavior-properties

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR I haven't made it configurable at all and just used the OpenDAL defaults but I can definitely update the PR to make it configurable if that's desirable.

I was originally thinking it may be worth just adding the TimeoutLayer in place with the existing defaults similar to the RetryLayer and doing configurability changes in a follow up, but happy to do it now as well.

I didn't see any knobs for configuring the object level retries and timeouts in the table behavior properties link, but looking at other implementations, I could add:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its reasonable to follow up with fine grained configurations and merging as is fine. I would request (and I can handle this if your good handing off ownership) that the fine grained configurations are done at the storage trait layer.

I recommended this PR for merging and inclusion into the 0.10 release being gathered here:

Ok((operator, relative_path))
}

Expand Down
Loading