Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SeiDB] Fix MemIAVL Race Condition during snapshot reload #56

Merged
merged 4 commits into from
Jan 31, 2024

Conversation

yzang2019
Copy link
Collaborator

@yzang2019 yzang2019 commented Jan 30, 2024

Describe your changes and provide context

Problem:
In EVM, serving EVM RPC endpoints will have to query and access the latest state or historical state, when accessing latest state from MemIAVL, there's a race condition where the key we are trying to access doesn't exist because in the background, we have already switched to a difference snapshot after snapshot rewrite completes. Here's how it could happen:

  1. Goroutine A: get a Tree pointer, preparing to call tree.Get
  2. Goroutine B: reload snapshot and switched to a new tree, closed old tree
  3. Goroutine A: Still use the old tree pointer to call get, got segmentation fault

And here's the error when this race condition is triggered:
image

Solution:
This PR address the issue by doing an internal replacement, instead of closing the old tree and create a new tree, we replace all the variables within each tree in a thread-safe manner as well as clean up and closing the old snapshots safely, this guarantees that all other operations will be block while we are replacing the whole tree.

Testing performed to validate your change

Add Unit Test to repro and catch race conditions

@yzang2019 yzang2019 changed the title [SeiDB] Fix MemIAVL Race Condition between state access and snapshot reload [SeiDB] Fix MemIAVL Race Condition during snapshot reload Jan 30, 2024
sc/memiavl/tree.go Show resolved Hide resolved
sc/memiavl/tree.go Outdated Show resolved Hide resolved
sc/memiavl/tree.go Outdated Show resolved Hide resolved
@@ -195,12 +202,16 @@ func (t *Tree) Has(key []byte) bool {
}

func (t *Tree) Iterator(start, end []byte, ascending bool) dbm.Iterator {
t.mtx.RLock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterator is tricker, it supposed the hold the RLock before the iterator is closed

Copy link
Collaborator Author

@yzang2019 yzang2019 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah that's a good point, iterator is a bit tricky, this lock only covers the creation of the iterator, might not be enough. I think for now it's probably fine since we don't have a lot of use cases using iterators in EVM, we definitely can add more locking coverage in the future

@yzang2019 yzang2019 merged commit 356344a into main Jan 31, 2024
8 checks passed
Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants