Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions polkadot/node/network/collator-protocol/src/collator_side/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1375,6 +1375,35 @@ async fn handle_network_msg<Context>(
unknown_heads: LruMap::new(ByLength::new(10)),
});

// Advertise collations for the current peer in case this is a reconnect.
//
// We might try to advertise collation T0 to the peer, then the peer disconnects
// before receiving the message. Later on, we generate a new collation T1
// and the peer reconnects. We need to make sure the peer gets T0 advertised as well.
//
// The `advertise_collation` ensures we are not readvertising the same collation
// multiple times.
if let Some(para_id) = state.collating_on {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled by

?

Aka when the peer view is announced, it should be informed about the collations.

Copy link
Contributor

@sandreim sandreim Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work, we already set the collation as advertised in here. So it will not advertise again when peer reconnects.

Maybe we'd want to reset this bit to 0 when the validator disconnects if the status is not Requested. But, because it is a race, it might be that the validator has already seen the advertisement. We just don't know from the collator side. In that case, we'd have to check if the collator is punished in any way (for advertising twice).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this should be an exceptional case, likely a side-effect of some other underlying networking issue. Why was the validator disconnecting in the first place ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is a race-case I've only encountered once.

It might happen due to network congestion or the following:

This debugging rabbit hole might improve the stability of litep2p even more 🙏 If the previous issue turns out to be correct, we are terminating the connections on fragmented socket reads due to a tiny offset mismatch in the poll_next implementation

if let Some(implicit_view) = &state.implicit_view {
gum::trace!(target: LOG_TARGET, ?peer_id, ?para_id, "Checking collations on possible reconnect");

for leaf in implicit_view.leaves() {
if let Some(per_relay_parent) = state.per_relay_parent.get_mut(leaf) {
advertise_collation(
ctx,
*leaf,
per_relay_parent,
&peer_id,
&state.peer_ids,
&mut state.advertisement_timeouts,
&state.metrics,
)
.await;
}
}
}
}

if let Some(authority_ids) = maybe_authority {
gum::trace!(
target: LOG_TARGET,
Expand Down
Loading