Allow second sync committee message to propagate if first one is stale #4015

etan-status · 2024-11-20T10:59:58Z

The sync committee message signature does not sign over the slot, making it possible for stale messages to be adjusted and replayed, possibly preventing new sync committee messages to propagate. If such a message for an older head is received, allow a second message to propagate that refers to the current head as selected by fork choice, overriding the previous one.

Also add a couple sanity checks, as in, disallowing messages referring to unknown or invalid blocks to spread, in line with existing rules for blob sidecars, parent block roots and attestations.

The sync committee message signature does not sign over the slot, making it possible for stale messages to be adjusted and replayed, possibly preventing new sync committee messages to propagate. If such a message for an older head is received, allow a second message to propagate that refers to the current head as selected by fork choice, overriding the previous one. Also add a couple sanity checks, as in, disallowing messages referring to unknown or invalid blocks to spread, in line with existing rules for blob sidecars, parent block roots and attestations.

etan-status · 2024-11-20T11:00:21Z

Implemented in Nimbus: status-im/nimbus-eth2#4953

mkalinin · 2024-11-27T10:28:30Z

specs/altair/p2p-interface.md

- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index`
+- _[IGNORE]_ The block being signed (`sync_committee_message.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee messages for processing once block is retrieved).
+- _[REJECT]_ The block being signed (`sync_committee_message.beacon_block_root`) passes validation.
+- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index`, unless the block being signed (`beacon_block_root`) matches the local head as selected by fork choice, and the earlier valid sync committee message does not match


What do you think about slightly different rule? Unless the beacon_block_root refers to a block with a slot greater than the one that has already been seen. This could remove the dependency on the local view to the head of the chain and make it more resilient

Wouldn't that allow spamming, i.e., by sending a message with slot 5, then 6, then 7, then 8.

And in some reorgs, the parent block of the next block may have a lower slot than the highest one observed across all branches.

Wouldn't that allow spamming

There is the block.slot that should increase to avoid discarding the message, not the message.slot. I don’t think it has any room for spamming as the beacon_block_root is signed.

And in some reorgs, the parent block of the next block may have a lower slot than the highest one observed across all branches.

Right, but in this case the next block should have a greater slot. There are edge cases when re-org happens to a block with a lower slot than the slot of the previous head, but this should be a rare case on mainnet.

Ah, yeah, block.slot would work there as the spam protection as well. The edge cases with reorg to a lower parent slot would get worse, though, compared to proposed solution. It's also still depending on the local view of the chain as a whole, as in, one still needs to have the block available for the second message to propagate. From a reward perspective, the only message that has any value is one that signs the correct parent root of the next slot's block.

I guess to decide which strategy is best, maybe @potuz ? Maybe there's an even better way :)

Actually, no, the spam problem is still there. Someone could go extreme and sign beacon block from 2 weeks ago with message.slot set to the current slot. and then send another message that signs the beacon block 1 after that. and so on, until the present. and all of those messages would propagate, no?

The split view is not really prevented though, as long as the fix incorporates that the block header being signed has to be known. The honest signatures should only come ~4s into the slot (or at the very least after the block, similar to attestations, there are incentives to vote for the most likely head as selected by forkchoice); so at that stage fork choice is already done.

btw, as your suggestion doesn't support the reorg to lower slot case, couldn't it be simplified to, instead of requiring monotonical increase, just allow a second message to sign a block header for the current slot? that would fix the spam potential but keep the edge case problem open.

If one wants to get rid of fork choice dependency, the rules could also be changed to allow a vote for a single additional unknown/invalid block root to propagate. i.e., allow an initial message to spread (whatever), potentially allow a second message to spread if it refers to an unknown/invalid block root (to cover the split view case), and allow a final message to spread if it pertains to the head root (as selected by forkchoice) as proposed (to cover reorg to lower slot etc).

Suppose, there is a block A with two children B and C, C.slot = B.slot + 1. C is timely only for a half of nodes and weights are distributed in a way that C is the winner for those who has received it in time (proposer_boost_weight > get_weight(B)), so a half of sync committee participants sends committee message with C as the head and another half with B. Both block are eventually seen and correct but nodes with head=B can’t propagate the messages with beacon_block_root = htr(C) and vice versa.

doesn't support the reorg to lower slot case, couldn't it be simplified to, instead of requiring monotonical increase, just allow a second message to sign a block header for the current slot

It’s not strictly the same as validator could send a previous message for a block in slot N-2 and now the head is the block from slot N-1. In this case monotonically increasing constraint would allow the second message while same slot rule would not.

I think ideally there should be a SignedSyncCommitteeMessage to prevent slot tampering. So the original problem is resolved without a workaround, as it seems any workaround has its own corner cases.

Yeah. btw the workaround only triggers if there is active slot tampering going on. In any case, the initial message per validator propagates regardless of restrictions.

Your example could be mitigated by allowing in each slot for (1) the first message to spread, (2) a message pertaining to an unknown root to spread, (3) a message pertaining to the local head to spread.

Your case is essentially triggering (2), as the local node does not know the slot of the signed root at the time it receives the sync message in that scenario. So at that stage, it would result in IGNORE and the message is already stopped before the blocks end up being seen. And IGNORE is cached on the libp2p message-id, so the message still won't propagate lateron once the block is seen.

The problem with (2) is, that one has to also cache a list of all unknown roots that have been signed. Otherwise, if there is an old orphaned block somewhere that is not available in local DB and also is no longer available on the network, an attacker could just replay that outdated message with tampered slot. So, the attacker could waste both extra messages with slot tampering (the initial one for free, and the second one by tampering a sig for an unknown orphaned root) and then the correct message may still not propagate (as it does not pertain to the local head).

Another interesting case is checkpoint sync, where the old data may be locally unavailable even if canonical; but that's at least not affecting a large count of validators.

I see. There seem to be edge cases in whatever solution, but the proposed solution is a strict improvement regardless. So if we can’t have validator to sign an envelope to prevent slot tampering then verifying a block against the local head is fine.

Yeah. I mean, we can fix it in a future fork, but it still needs to be fixed / documented for current mainnet forks regardless, and it's probably good enough with one of the proposed fixes as it sufficiently disincentivizes the slot tampering attack, at least until sync committees are removed with beam anyway.

mkalinin

Looks good to me! But I’d like to see approves from other CL devs before merging

rolfyone · 2024-12-10T22:30:24Z

specs/altair/p2p-interface.md

@@ -161,7 +163,9 @@ The following validations MUST pass before forwarding the `sync_committee_messag
 - _[IGNORE]_ The message's slot is for the current slot (with a `MAXIMUM_GOSSIP_CLOCK_DISPARITY` allowance), i.e. `sync_committee_message.slot == current_slot`.
 - _[REJECT]_ The `subnet_id` is valid for the given validator, i.e. `subnet_id in compute_subnets_for_sync_committee(state, sync_committee_message.validator_index)`.
  Note this validation implies the validator is part of the broader current sync committee along with the correct subcommittee.
- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index`
+- _[IGNORE]_ The block being signed (`sync_committee_message.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee messages for processing once block is retrieved).


This is probably pedantic, but via both gossip and non-gossip i think more accurately would be reflected as via gossip or non-gossip sources
I can't imagine a case where we'd ever want fetching every block that we receive over gossip also over RPC which this currently kind of implies...
This is also repeated above, I think we just change it...

This is not new wording from this PR, but instead consistent with existing wording from everywhere else. The semantics here are meant to be identical to the existing one, and wording changes should probably be synchronized across all mentions.

Ok, i'll raise a separate PR for that, I still think it's misleading.

I've fixed the other instances in #4049

mcdee · 2024-12-16T07:55:39Z

specs/altair/p2p-interface.md

@@ -140,6 +140,8 @@ def get_sync_subcommittee_pubkeys(state: BeaconState, subcommittee_index: uint64
 - _[REJECT]_ `contribution_and_proof.selection_proof` selects the validator as an aggregator for the slot -- i.e. `is_sync_committee_aggregator(contribution_and_proof.selection_proof)` returns `True`.
 - _[REJECT]_ The aggregator's validator index is in the declared subcommittee of the current sync committee --
  i.e. `state.validators[contribution_and_proof.aggregator_index].pubkey in get_sync_subcommittee_pubkeys(state, contribution.subcommittee_index)`.
+- _[IGNORE]_ The block being signed (`contribution_and_proof.contribution.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee contributions for processing once block is received)


I'm not sure about adding this in as a condition. It's not uncommon to see a fair bit of spread around the time that a block is received across different beacon nodes on the same server, let alone across multiple servers. Generating and signing sync committee messages is a pretty quick process, so if a beacon node discards these messages because it has yet to see the relevant block it could well be failing to hold on to a correct message.

How about if the beacon node will only ignore the message only if it has seen a block for the current slot? That gives protection against this common situation. If there is a concern about flooding the network with bogus messages by slower nodes then limit it to a single unknown beacon block root per beacon node.

It's the same condition as for attestations, aggregates, and EIP-7732 payloads / attestations.

consensus-specs/specs/phase0/p2p-interface.md

Lines 464 to 465 in 001caab

- _[IGNORE]_ The block being voted for (`attestation.data.beacon_block_root`) has been seen

(via both gossip and non-gossip sources)

In practice, sync messages and attestations are sent around ~4s into the slot, and sync contributions and aggregates are sent around ~8s into the slot.

If the condition is a problem here, it is also a problem for attestations.

The point is not the specific time that sync messages show up, it's that there can be a fair spread of time over which some nodes have the current slot's block and others do not.

From my understanding, at least some of the beacon nodes queue attestations when they do not have the block for the current slot rather than discard them (not sure if they forward them or not, that's another question). But having nodes ignore sync committee messages for the current slot when the node doesn't yet have the current slot's block doesn't seem to be an optimal solution to me.

jtraglia approved these changes Nov 22, 2024

View reviewed changes

mkalinin reviewed Nov 27, 2024

View reviewed changes

mkalinin approved these changes Dec 4, 2024

View reviewed changes

tbenr mentioned this pull request Dec 5, 2024

Changes in sync committee gossip rules Consensys/teku#8886

Open

rolfyone reviewed Dec 10, 2024

View reviewed changes

rolfyone mentioned this pull request Dec 14, 2024

clarify gossip sources wording #4049

Merged

mcdee suggested changes Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow second sync committee message to propagate if first one is stale #4015

Allow second sync committee message to propagate if first one is stale #4015

etan-status commented Nov 20, 2024

etan-status commented Nov 20, 2024

mkalinin Nov 27, 2024

etan-status Nov 27, 2024

mkalinin Nov 27, 2024

etan-status Nov 27, 2024

etan-status Nov 27, 2024

etan-status Nov 29, 2024

mkalinin Dec 2, 2024

etan-status Dec 2, 2024

mkalinin Dec 3, 2024

etan-status Dec 3, 2024

mkalinin left a comment

rolfyone Dec 10, 2024

etan-status Dec 13, 2024

rolfyone Dec 14, 2024

rolfyone Dec 19, 2024

mcdee Dec 16, 2024

etan-status Dec 16, 2024

mcdee Dec 19, 2024

	- _[IGNORE]_ The block being voted for (`attestation.data.beacon_block_root`) has been seen
	(via both gossip and non-gossip sources)

Allow second sync committee message to propagate if first one is stale #4015

Are you sure you want to change the base?

Allow second sync committee message to propagate if first one is stale #4015

Conversation

etan-status commented Nov 20, 2024

etan-status commented Nov 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkalinin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment