Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-4504] Implement timeout-based retry method for txns #69

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nvinuesa
Copy link
Member

@nvinuesa nvinuesa commented Sep 1, 2023

This patch is part of the set of patches aiming to fix https://bugs.launchpad.net/juju/+bug/2031631.

The idea here is to implement a transaction retry mechanism similar to the one being used in the official mongodb driver, i.e. a 120 seconds timeout for transactions to finish or waiting while being retried.

The context passed to the mgo Run method is necessary in order to cancel the actual request that's being executed by mgo either after timeout, either because the user (juju) cancels the passed context.

Tests have been fixed and some have been deleted since they are no longer relevant (we don't retry for a fixed number of times, instead we retry until timeout).

For some more context, here is the high-level picture of how we handle the retry (from juju to mgo via txn) by @hpidcock :

1) timer starts in juju/txn
2) juju/txn passes cancellation context/chan to juju/mgo/sstxn runner.Run
3) juju/mgo retries until cancellation or it gets a non-writeConflict error
4) the juju/txn layer does its retry, instead of for i := 0; i < tr.nrRetries; i++ { it just goes until it is timed out.

Note: The original PR on juju is juju/juju#16159 and on juju/mgo is juju/mgo#27.

This patch is part of the set of patchs aiming to fix
https://bugs.launchpad.net/juju/+bug/2031631.

The idea here is to implement a transaction retry mechanism similar to
the one being used in the official mongodb driver, i.e. a 120 seconds
timeout for transactions to finish or waiting while being retried.

The context passed to the mgo Run method is necessary in order to cancel
the actual request that's being executed by mgo either after timeout,
either because the user (juju) cancels the passed context.

Tests have been fixed and some have been deleted since they are no
longer relevant (we don't retry for a fixed number of times, instead we
retry until timeout).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant