You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Catching up to consensus when starting a node that is behind (either because of starting from snapshot, state-sync or community cosmos data; or because the node was offline for some time), currently requires replaying the chain activity as it happened. While cometbft has a fast-sync mechanism to fetch block info, from the cosmos pov, these are just new blocks that require execution. It just reduces the amount of time spent idle "between blocks", but doesn't affect the time spent executing blocks whatsoever.
State-sync is a mechanism that helps sharing a historical state that can be restored, but from that restored state height, the node needs to catch up (usually through fast-sync), which involves executing "historical" blocks.
We currently attempt to limit Swingset execution to about 5s during end block time, which mean at most the chain software should spend 50% of its time doing swingset execution, the rest being idle during voting time. However we want to reduce impact on the p2p network and better utilize the idle voting time by shifting some of the execution during that time (see #6741). This would raise the chain utilization ratio, making it harder for a node joining from a historical height to catch up if the chain is busy.
During the recent mainnet performance issues, we saw block times being around 12-14s avg, and the higher utilization ratio prevented a significant portion of validators from being able to catch up if they fell out of consensus.
Description of the Design
Work with cometbft and cosmos teams to design a mechanism similar to state-sync for fetching state changes from a block or set of blocks, so that instead of re-executing blocks, we can just apply the outcome of the execution, aka the state changes, which is much cheaper in our case.
We likely want something akin to changelogs (add / update / delete) operations on keyed entries so that multiple block changelogs can be merged together eliminating intermediate values. This mechanism needs to support extension payloads too so that modules like swingset which maintain external state can synchronize using this mechanism as well. As such these changelog keyed entries need to support large values for artifacts that cannot be decomposed in smaller changesets.
Swingset could use the above mechanism as follow:
For every export data entry change, issue an add / update / delete record as appropriate (the current export data does not differentiate between add and update, so it would need to be extended)
Deliveries for the current transcript span would be added individually until the span rolls over, where all entries of the transcript span are deleted
if we want to restore inclusion of historical spans in state that validators should keep, the historical span can be added whole as a singled added change entry in the changelog on rollover
When rolling over a span, the previous snapshot entry is deleted, and the new snapshot entry is added in the changelog.
When a bundle is added (or removed in the future), add or delete the entry in the changelog
It is possible that the state-sync mechanism could be re-written to become the equivalent of applying a set of changes onto an exported base state, effectively collapsing the changes into a new base state.
Security Considerations
The result of applying synced changes to the cosmos app state and extensions must be validated, the same way state-sync validates the resulting app state using a trusted RPC server today.
Scaling Considerations
Generating, storing and sharing historical state changes like this may increase the resource utilization of the node providing them, however like state-sync, this is for the benefit of the network.
Test Plan
TBD
Upgrade Considerations
This requires upgrading software through the interchain stack.
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
Catching up to consensus when starting a node that is behind (either because of starting from snapshot, state-sync or community cosmos data; or because the node was offline for some time), currently requires replaying the chain activity as it happened. While cometbft has a fast-sync mechanism to fetch block info, from the cosmos pov, these are just new blocks that require execution. It just reduces the amount of time spent idle "between blocks", but doesn't affect the time spent executing blocks whatsoever.
State-sync is a mechanism that helps sharing a historical state that can be restored, but from that restored state height, the node needs to catch up (usually through fast-sync), which involves executing "historical" blocks.
We currently attempt to limit Swingset execution to about 5s during end block time, which mean at most the chain software should spend 50% of its time doing swingset execution, the rest being idle during voting time. However we want to reduce impact on the p2p network and better utilize the idle voting time by shifting some of the execution during that time (see #6741). This would raise the chain utilization ratio, making it harder for a node joining from a historical height to catch up if the chain is busy.
During the recent mainnet performance issues, we saw block times being around 12-14s avg, and the higher utilization ratio prevented a significant portion of validators from being able to catch up if they fell out of consensus.
Description of the Design
Work with cometbft and cosmos teams to design a mechanism similar to state-sync for fetching state changes from a block or set of blocks, so that instead of re-executing blocks, we can just apply the outcome of the execution, aka the state changes, which is much cheaper in our case.
We likely want something akin to changelogs (add / update / delete) operations on keyed entries so that multiple block changelogs can be merged together eliminating intermediate values. This mechanism needs to support extension payloads too so that modules like swingset which maintain external state can synchronize using this mechanism as well. As such these changelog keyed entries need to support large values for artifacts that cannot be decomposed in smaller changesets.
Swingset could use the above mechanism as follow:
It is possible that the state-sync mechanism could be re-written to become the equivalent of applying a set of changes onto an exported base state, effectively collapsing the changes into a new base state.
Security Considerations
The result of applying synced changes to the cosmos app state and extensions must be validated, the same way state-sync validates the resulting app state using a trusted RPC server today.
Scaling Considerations
Generating, storing and sharing historical state changes like this may increase the resource utilization of the node providing them, however like state-sync, this is for the benefit of the network.
Test Plan
TBD
Upgrade Considerations
This requires upgrading software through the interchain stack.
The text was updated successfully, but these errors were encountered: