SDK
Rust (affects all language bindings)
Description
The current retry strategy for stream creation and recovery uses tokio_retry::strategy::FixedInterval (lib.rs:990, arrow_stream.rs:509). This means all retry attempts use the same fixed delay (default: 2 seconds), with no jitter.
Fixed-interval retries are problematic in distributed systems:
- Thundering herd: When a server failure disconnects many clients simultaneously, all clients retry at the same interval, concentrating load on the recovering server.
- No backoff under sustained failures: A fixed 2-second interval gives the server no additional recovery time as failures persist.
- Exponential backoff with jitter is the industry standard recommended by AWS, GCP, and gRPC best practices.
Proposed Solution
- Add a
RetryStrategy enum to StreamConfiguration:
pub enum RetryStrategy {
Fixed,
ExponentialBackoffWithJitter,
}
- Add
max_recovery_backoff_ms field (cap for exponential growth, default: 30,000 ms).
- Change the default strategy from
Fixed to ExponentialBackoffWithJitter.
recovery_backoff_ms serves as the initial backoff for exponential, or the interval for fixed.
- No new dependencies required —
tokio-retry already provides ExponentialBackoff and jitter.
Changes are needed in 2 locations in the Rust core (lib.rs and arrow_stream.rs), plus configuration structs. All language bindings inherit the behavior through their respective FFI/binding layers.
Additional Context
References:
Current code (lib.rs:990-991):
let strategy = FixedInterval::from_millis(options.recovery_backoff_ms)
.take(options.recovery_retries as usize);
Proposed equivalent:
use tokio_retry::strategy::{ExponentialBackoff, jitter};
let strategy = ExponentialBackoff::from_millis(options.recovery_backoff_ms)
.max_delay(Duration::from_millis(options.max_recovery_backoff_ms))
.map(jitter)
.take(options.recovery_retries as usize);
SDK
Rust (affects all language bindings)
Description
The current retry strategy for stream creation and recovery uses
tokio_retry::strategy::FixedInterval(lib.rs:990,arrow_stream.rs:509). This means all retry attempts use the same fixed delay (default: 2 seconds), with no jitter.Fixed-interval retries are problematic in distributed systems:
Proposed Solution
RetryStrategyenum toStreamConfiguration:max_recovery_backoff_msfield (cap for exponential growth, default: 30,000 ms).FixedtoExponentialBackoffWithJitter.recovery_backoff_msserves as the initial backoff for exponential, or the interval for fixed.tokio-retryalready providesExponentialBackoffandjitter.Changes are needed in 2 locations in the Rust core (
lib.rsandarrow_stream.rs), plus configuration structs. All language bindings inherit the behavior through their respective FFI/binding layers.Additional Context
References:
tokio-retrysupportsExponentialBackoffandjitterout of the boxCurrent code (
lib.rs:990-991):Proposed equivalent: