-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry
keyword does not check for stale objects before retrying.
#34
Comments
In theory we are supposed to have a transaction timeout capability, so that a long-running transaction like this gets gunned down and at that point the state would be checked. |
I guess what I'm saying is that this problem doesn't seem specific to retry. |
It's not specific to retry, but the code for performing an explicit |
For example, in nearly all other exception scenarios we check for stale objects in the loop, like here: fabric/src/system/fabric/worker/Worker.java Line 822 in 099e3b0
However, the case for RetryException doesn't do this: fabric/src/system/fabric/worker/Worker.java Lines 798 to 800 in 099e3b0
It's honestly a really simple fix. However, I think the logic in this loop has gotten a bit hairy and complicated with small changes over time in a way that suggests that there's a cleaner, harder-to-get-wrong rewrite of the transaction loop logic that should be considered. I'm largely documenting the issue here to come back to later (tomorrow/this weekend) when I'm not debugging something else. 😃 |
Checking for stale objects is expensive, of course. But I guess retry happens rarely enough that it's not an issue? |
I'm actually thinking we could do the check asynchronously here and potentially other places where we're going to retry regardless. |
So yeah, this is like the timer idea we've tossed around for long-running transactions. |
The timer thing is actually in the original SOSP paper but I guess it never got implemented. |
The implementation of the
retry
keyword can cause problems when working off of stale worker cache data.For example
Will infinitely loop if
val.condition()
returns true only when state is inconsistent (the collection of cached objects read never existed at the same time on their respective stores in a consistent state).We should update the transaction loop to check for stale objects before running a user
retry
or userabort
.The text was updated successfully, but these errors were encountered: