-
Notifications
You must be signed in to change notification settings - Fork 298
Retry Logic Overview (WIP)
This document exists to be the authoritative document on retrying requests over a network. There are quite a few places where this applies during the replication process. This will cover what should happen in the event of both a transient and permanent error. A transient error is one that is expected to pass given a relatively short period of time (such as a connection timeout, or a 503). A permanent error is the opposite (such as a 401 or 404), and is not likely to recover without intervention. This document will not cover other replication logic such as "going offline."
The flow of the replication retry follows:
- Replication attempts to run
- A connection error occurs
- 2a The connection error is transient, go to 3
- 2b The connection error is permanent, go to 4
- Retry according to the applied retry strategy (not customizable on all platforms)
- 3a The retry strategy fails, go to 4
- 3b The retry strategy succeeds, go to 1
- At this point the error is considered a permanent one
- 4a The replication is continuous. Switch to idle, set last error, enter long delay (~60 sec) and go to 1
- 4b The replication is non-continuous. Set last error, give up and stop the replication
Examples:
Start non-continuous replication
Initial connection reports 401 (Unauthorized)
Stop replication, callback for error and stopped status (two notifications)
Start non-continuous replication
Halfway through, a 503 error is encountered (Service Unavailable)
Error is transient, so retry
Retry succeeds, replication continues
Start non-continuous replication
Halfway through, a connection time out happens
Error is transient, so retry
Retry failed, replication stops
Start a continuous replication
A 404 error is encountered on the endpoint
Permanent error, so don't retry the request
Enter master retry loop (wait 60 sec and restart replication)