Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create block filters before poll interval #152

Merged
merged 13 commits into from
Sep 6, 2024

Conversation

rodion-lim-partior
Copy link
Contributor

Fixes: #151

Raising this PR to allow for block filters to be created on init instead of after the block polling interval.

@rodion-lim-partior rodion-lim-partior requested a review from a team as a code owner August 27, 2024 09:12
@EnriqueL8
Copy link
Contributor

Thanks for the PR @rodion-lim-partior - @Chengxuan could you take a look at this one?

Copy link
Contributor

@Chengxuan Chengxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodion-lim-partior Thanks for spotting #151 !

The solution proposed is likely to make the problem happen less frequently as long as no transactions are submitted before the new block filter is created on the node.

However, there is still a race condition. I think a better fix here is to ensure the checkStartedLocked function doesn't return until a block filter is established.

Please also add a test to catch future regressions.

@rodion-lim-partior
Copy link
Contributor Author

rodion-lim-partior commented Aug 29, 2024

@Chengxuan, noticed the following when I tried to fix the race condition:

  1. When we add a consumer to the block listener, we acquire a lock
  2. Call checkStartedLocked, which starts the block listener loop in a separate goroutine
  3. Establish block height next, which requires acquiring a lock to write to block listener, this is blocked till checkStartedLocked has finished
  4. We then create a block filter. If I block checkStartedLocked until the filter is created, that ends up in a deadlock, since checkStartedLocked is required to finish to release the lock before establishing block height can proceed

Is there a reason why we acquire a lock before calling checkStartedLocked? Would there be any issues if we shift it downwards?

func (bl *blockListener) addConsumer(c *blockUpdateConsumer) {
	bl.mux.Lock()
	defer bl.mux.Unlock()
	bl.checkStartedLocked()
	bl.consumers[*c.id] = c
}

@Chengxuan
Copy link
Contributor

Chengxuan commented Aug 29, 2024

@rodion-lim-partior It feels the problem is within the scope of checkStartedLocked function, so the solution shouldn't need to touch the global lock that is used to protect shared resources for concurrency.

The key problem to solve is synchronization between the block listener loop separate go routine and the main go routine that created that separate go routine.

Using a channel instead of a lock for go routine synchronization would be more idiomatic.

@rodion-lim-partior
Copy link
Contributor Author

@Chengxuan, can you help to take a look at the current commit if this was what you referred to?

@Chengxuan
Copy link
Contributor

@rodion-lim-partior Changes look great! I pointed out one possible issue in the code, you should be able to produce that error case in a test.

Once tests are added, I'm happy to take another look.

Thank you for the work!

@EnriqueL8
Copy link
Contributor

Thank @Chengxuan and @rodion-lim-partior for the collaboration on this one! Looking forward to seeing some tests

@rodionlim
Copy link

@Chengxuan, the commit I pushed causes a deadlock in my local environment due to the reasons in my previous comments. I will have to shift the global locks downwards to resolve the deadlocks. From local testing, shifting the locks down does not seem to break the existing implementation. Let me know if that works, I will implement the tests and refactor based on your comments after.

@Chengxuan
Copy link
Contributor

Chengxuan commented Aug 30, 2024

Hi @rodion-lim-partior I think you have a good understanding of the code now. So please feel free to try out different options and write tests. Please bear in mind the logic must ensure only 1 listener loop go routine can be created. Also, feel free to split global mutex to be resource specific if needed.

@rodionlim
Copy link

@Chengxuan, got it. Will do some refactoring of the code and come back with the tests for another round of review. Thanks for the pointers!

@rodion-lim-partior
Copy link
Contributor Author

rodion-lim-partior commented Sep 3, 2024

@Chengxuan @EnriqueL8, apologies for the delay. I refactored the code and fixed some of the race conditions along the way. Added the tests as well for the edge case and the happy case.

Some observations (correct me if I'm wrong):

  1. There are two functions addConsumer and getHighestBlock which used to lock checkStartedLocked itself, leading to behavior which is difficult to reason about and unintentionally cause certain logic to occur in a particular sequence (described in a previous comment)
  2. There is a small chance that a consumer is not added to the block listener (because the thread is slow etc.), causing mock getFilterChanges calls in the tests to skip the dispatch call, resulting in failed tests. This has now been fixed as well
  3. Now that we wait for block filters to be established, many of the tests in event_actions.go and event_listener.go need to mock the RPC calls to create the block filters and listen on changes, this has also been corrected.

Copy link
Contributor

@EnriqueL8 EnriqueL8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - a few thoughts

internal/ethereum/blocklistener.go Outdated Show resolved Hide resolved
internal/ethereum/blocklistener.go Outdated Show resolved Hide resolved
internal/ethereum/blocklistener.go Show resolved Hide resolved
internal/ethereum/blocklistener.go Outdated Show resolved Hide resolved
@Chengxuan
Copy link
Contributor

@rodionlim logic looks good to me 👍 , some minor comments to clean up bits that are not necessary.

@rodion-lim-partior
Copy link
Contributor Author

@Chengxuan, pushed a commit to clean up the code, left with one pending comment that needs a review

@Chengxuan
Copy link
Contributor

@rodion-lim-partior thanks. For one pending comment that needs a review I assume that's referring to the removal of ctx parameter in blockTillBlockFilterEstablished that you are reviewing and addressing.

If there are something you'd like me to review, please let me know.

@rodion-lim-partior
Copy link
Contributor Author

@EnriqueL8, yep that's right. Rest of your review comments have been addressed. The pending one that needs a review, I need @Chengxuan to take a look. Removing the context will cause test failures.

Signed-off-by: Chengxuan Xing <[email protected]>
@Chengxuan
Copy link
Contributor

Chengxuan commented Sep 5, 2024

@rodion-lim-partior I looked at the code closer and understood the need to have the context, here is the proposed change: partior-3p#1

6c15c1a has conflict with the proposal, so you'll need to take a look and resolve that.

Otherwise, that PR is ready to go in.

@Chengxuan
Copy link
Contributor

Hi @rodion-lim-partior I'm planning to do a release to pick up another bug fix tomorrow. I'd like to get this PR in as well, please take a look at partior-3p#1 and merge it if you are happy with it. I've sorted out the merge conflicts in that PR as well

Signed-off-by: rodion <[email protected]>
@rodion-lim-partior
Copy link
Contributor Author

rodion-lim-partior commented Sep 6, 2024

@Chengxuan, merged those changes in. Corrected some of the lint failures as well when you run make test. Seems to be working fine on my local environment after rebasing your changes in

Copy link
Contributor

@Chengxuan Chengxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks for your contribution @rodion-lim-partior 👍

@Chengxuan Chengxuan merged commit 3c87e67 into hyperledger:main Sep 6, 2024
3 checks passed
@EnriqueL8
Copy link
Contributor

Thanks both! Loved the discussion on this PR

@rodion-lim-partior
Copy link
Contributor Author

rodion-lim-partior commented Sep 6, 2024

Cheers guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

evmconnect is unable to retrieve transaction receipts when block polling interval is set too long
4 participants