Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition block lookup sync to range sync #6122

Merged
merged 5 commits into from
Oct 8, 2024

Conversation

dapplion
Copy link
Collaborator

Issue Addressed

For context of the issue 👇

Closes #6099

Proposed Changes

This solution links range sync and lookup sync by:

  • Trigger lookup sync with status messages that include unknown root. Currently range sync "swallows" peer statuses that are too close to head but on a potentially unknown fork. Range sync is not suitable to handle short forks, so instead deffer to lookup sync to retrieve that unknown root.
  • If lookup sync reaches the max chain length, trigger range sync on that set of peers. The idea is that most peers in that group should have this chain as canonical so we can blocks_by_range from them. In the future we may use Add BeaconBlocksByRange v3 ethereum/consensus-specs#3845 to be sure what fork are we syncing from.

I believe this change is the minimal solution to cover the corner case of #6099. It's not maximally efficient as it will download a section of blocks twice. However this situation is rare, and we just need some way to recover even if it's at the expense of some extra bandwidth.

Testing

There are plans to with EF devops to test nasty long non-finality situations, but not now. For now I can try to write representative unit sync tests with some sample scenarios. This change should not impact good network codepaths, so should have no downsides. Range sync will only be triggered when we need to discover a long unknown fork, which has only happened in LH in cooked testnets of lookup sync bugs.

@dapplion
Copy link
Collaborator Author

dapplion commented Aug 1, 2024

Updated tests to assert a syncing chain is created. Plus removed down-scoring the peers for a long chain

// Do not downscore peers here. Because we can't distinguish a valid chain from
// a malicious one we may penalize honest peers for attempting to discover us a
// valid chain. Until blocks_by_range allows to specify a tip, for example with
// https://github.com/ethereum/consensus-specs/pull/3845 we will have poor
// attributability. A peer can send us garbage blocks over blocks_by_root, and
// then correct blocks via blocks_by_range.

As a final item we could check that for a syncing chain if it reaches the end, assert that the block root matches that syncing root

@dapplion dapplion added the ready-for-review The code is ready for review label Sep 2, 2024
}
self.drop_lookup_and_children(*lookup_id);
}
None => (lookup.block_root(), lookup.peek_downloaded_block_slot()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran out of time looking through to find out the scenario when this occurs.

Its when we don't have any single block lookups that match the current parent tip. In this case we start a range sync based on the parents ancestor, rather than the parent_chain_tip?

I didn't fully parse this.

Overall I like the idea!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never happen. I can log an error and return, but used this fallback value for "simplicity". Happy to change if this confuses more than helps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. I think that might be helpful. This way we can catch any wild states rather than doing a range sync. Also (at least for me, will make it easier to follow).

Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should test this out. Logic seems reasonable to me.

Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@realbigsean
Copy link
Member

@mergify queue

Copy link

mergify bot commented Oct 8, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@realbigsean
Copy link
Member

@mergify refresh

Copy link

mergify bot commented Oct 8, 2024

refresh

✅ Pull request refreshed

@realbigsean
Copy link
Member

@mergify queue

Copy link

mergify bot commented Oct 8, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 71c5388

@mergify mergify bot merged commit 71c5388 into sigp:unstable Oct 8, 2024
28 checks passed
@dapplion dapplion deleted the lookup-to-range branch October 8, 2024 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants