Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on lock_timeout errors #353

Merged
merged 13 commits into from
May 8, 2024
Merged

Conversation

andrew-farries
Copy link
Collaborator

Retry statements and transactions that fail due to lock_timeout errors.

DDL operations and backfills are run in a session in which SET lock_timout TO xms' has been set (x defaults to 500 but can be specified with the --lock-timeout parameter). This ensures that a long running query can't cause other queries to queue up behind a DDL operation as it waits to acquire its lock.

The current behaviour if a DDL operation or backfill batch times out when requesting a lock is to fail, forcing the user to retry the migration operation (start, rollback, or complete).

This PR retries individual statements (like the DDL operations run by migration operations) and transactions (used by backfills) if they fail due to a lock_timeout error. The retry uses an exponential backoff with jitter.

Fixes #171

@andrew-farries andrew-farries merged commit 5c1aef2 into main May 8, 2024
44 checks passed
@andrew-farries andrew-farries deleted the retry-on-ddl-lock-timeouts branch May 8, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry on DDL lock acquisition failures
2 participants