Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream sync: context deadline exceeded #4749

Open
Frozen opened this issue Sep 3, 2024 · 4 comments
Open

Stream sync: context deadline exceeded #4749

Frozen opened this issue Sep 3, 2024 · 4 comments
Assignees

Comments

@Frozen
Copy link
Contributor

Frozen commented Sep 3, 2024

Describe the bug
{"level":"error","port":"9000","ip":"127.0.0.1","module":"staged stream sync","ShardID":0,"error":"context deadline exceeded","streamID":"QmSAaTjhHWNF42A4xWLkmcwnmEJhatjbqMpv84HBxeNzkt","caller":"/Users/frozen/go/src/github.com/harmony-one/harmony/api/service/stagedstreamsync/syncing.go:520","time":"2024-09-03T13:45:10.258677-04:00","message":"[STAGED_STREAM_SYNC]: getCurrentNumber request failed"}

lib p2p stream being replaced with new one, but message is expected from old stream. Need to understand why new streams are created, while we already have connection.

To Reproduce
Steps to reproduce the behavior:

  1. Check out code with dev branch
  2. Build
  3. run make debug

Expected behavior
This error should never happen because it's localhost and no network issue are possible.

@Frozen
Copy link
Contributor Author

Frozen commented Sep 4, 2024

{"level":"warn","port":"9002","ip":"127.0.0.1","mode":"epoch chain short range","error":"stream removed when doing request","stream":"QmTXwseacFvVFeHfSBSwL3xdAHMyTeesNUu2DJaGuYWZt5","caller":"/Users/frozen/go/src/github.com/harmony-one/harmony/api/service/stagedstreamsync/short_range_helper.go:185","time":"2024-09-03T22:43:13.597391-04:00","message":"[STAGED_STREAM_SYNC]: failed to doGetBlockHashesRequest"}

@GheisMohammadi GheisMohammadi self-assigned this Sep 9, 2024
@GheisMohammadi
Copy link
Contributor

The removal of the P2P stream is expected in this case, as the getCurrentNumber request failed multiple times. When the sync process is unable to retrieve data from the stream after several attempts, the stream is replaced. However, the key issue lies in understanding why the getCurrentNumber request is failing in the first place. This needs further investigation to determine the root cause behind the failed request.

@GheisMohammadi
Copy link
Contributor

The PR #4762 tries to address the issue. It is in progress.

@GheisMohammadi
Copy link
Contributor

GheisMohammadi commented Oct 1, 2024

Investigations showed the issue was introduced by the latest update to the muxers for the P2P host. Switching localnet to exclusively use Yamux (PR #4764 ) resolved the stream instability problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants