-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SeiDB] Fix MemIAVL Race Condition during snapshot reload #56
Conversation
@@ -195,12 +202,16 @@ func (t *Tree) Has(key []byte) bool { | |||
} | |||
|
|||
func (t *Tree) Iterator(start, end []byte, ascending bool) dbm.Iterator { | |||
t.mtx.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterator is tricker, it supposed the hold the RLock before the iterator is closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah that's a good point, iterator is a bit tricky, this lock only covers the creation of the iterator, might not be enough. I think for now it's probably fine since we don't have a lot of use cases using iterators in EVM, we definitely can add more locking coverage in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
Describe your changes and provide context
Problem:
In EVM, serving EVM RPC endpoints will have to query and access the latest state or historical state, when accessing latest state from MemIAVL, there's a race condition where the key we are trying to access doesn't exist because in the background, we have already switched to a difference snapshot after snapshot rewrite completes. Here's how it could happen:
And here's the error when this race condition is triggered:
Solution:
This PR address the issue by doing an internal replacement, instead of closing the old tree and create a new tree, we replace all the variables within each tree in a thread-safe manner as well as clean up and closing the old snapshots safely, this guarantees that all other operations will be block while we are replacing the whole tree.
Testing performed to validate your change
Add Unit Test to repro and catch race conditions