-
Notifications
You must be signed in to change notification settings - Fork 1
Add Reorg Detection Infrastructure for Verifiers #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,204 @@ | |||
| package verifier | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a stub service with comments on what the algorithm will look like. Actual implementation will be in a subsequent PR
| reorgInProgress atomic.Bool // Set during reorg handling to prevent new tasks from being added | ||
|
|
||
| // Per-chain pending task queue | ||
| pendingTasks []VerificationTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to be per chain instead of a global queue
| continue | ||
| } | ||
|
|
||
| sourceCfg, ok := vc.config.SourceConfigs[chainSelector] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will probably extract into smaller functions.
| // The chain tail is automatically sized to 2 * FinalityDepth to provide | ||
| // sufficient buffer for reorg detection before finality violations. | ||
| // Default: 64 blocks | ||
| FinalityDepth uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to only support finality depth. Most chains support Finality Tags to tell which blocks are final, using it would avoid us from having to find out the depth each time we onboard a new chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we do for the chains that doesn't have finality tag then?
CC @AndresJulia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the configuration lives for 1.6, if FinalityTagEnabled is false we specify FinalityDepth (though please confirm this with @KodeyThomas or @simsonraj )
| // | ||
| // The status channel will receive: | ||
| // - ChainStatusReorg: When a reorg is detected (includes reorg depth and common ancestor) | ||
| // - ChainStatusFinalityViolated: When a block deeper than FinalityDepth is reorged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With RMN we had multiple times where we have a finality violation but there was no messages in either the new or old tail. I am wondering if it's worth checking for this here.
For instance we can detect a finality violation but that's not a big deal since there was no message that was re-orged
That would play well with inconsistent chains that have low to non-existent traffic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do love the idea. I think it'll complicate the logic and make the reorg detector which should be chain agnostic and even ccip agnostic tightly coupled to ccip. I don't know if we'll need to use it with executor as well or not but if we need it, it'll complicate things even further.
@0xAustinWang @winder Do you think we'll need the reorg component with executors at all?
| // SubscribeNewHeads subscribes to new block headers. | ||
| // Returns a channel that receives new headers as they arrive. | ||
| // Implementation may poll internally and push to channel for chains without native subscriptions. | ||
| // The returned channel is closed when subscription ends or context is cancelled. | ||
| // Returns error if subscription cannot be established. | ||
| SubscribeNewHeads(ctx context.Context) (<-chan *protocol.BlockHeader, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function is available in other clients and it leads to boilerplate code for calling LatestBlock functions. i.e.: solanas SubscribeToHeads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. It still abstracts this away from us. We want to have one API to subscribe instead of doing our own latestBlock. Also in case the underlying chain supports it, it gives us superior up to date heads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Implementation may poll internally and push to channel for chains without native subscriptions.
verifier/reorg_detector_service.go
Outdated
| // - Sends notifications via channel only when reorgs or finality violations are detected | ||
| // | ||
| // Tail Sizing: | ||
| // - Tail length = 2 * FinalityDepth (automatic, not configurable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that would happen in practise but is that possible that some RPC having some pruning which prevent us from rebuilding the tail? Especially if we support 2x finality depth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to only finalityDepth
|
|
||
| // ChainTail stores an ordered slice of block headers from stable tip to latest tip. | ||
| type ChainTail struct { | ||
| blocks []BlockHeader // ordered from oldest (stable) to newest (tip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many blocks will you hold in memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan was to do finalityDepth*2 blocks. With @carte7000's comment maybe I'll use only finalityDepth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd be able to detect a finality violation because the previous hashes would change, but you might not be able to find the common ancestor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know the common ancestor to update the latest safe block to start from with the checkpoint manager before stopping this reader. Probably can do this retroactively by checking all blocks instead of having them all in memory though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a finality violation beyond what you have in memory, how do you get the now-pruned blocks?
protocol/reorg_detector.go
Outdated
| // ChainStatus is a marker interface for different chain status types. | ||
| // Implementations: ChainStatusReorg, ChainStatusFinalityViolated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want guarantees on the enum type, use something like go-enum.
For this, something simple like type ChainStatus int seems totally sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because each one indicates different struct like that. Thought it'll be more intuitive that way rather than having a Type ChainStatus field in a struct with both CommonAncestorBlock and ViolatedBlock which doesn't make sense to have together. WDYT?
// ChainStatusReorg indicates a regular reorg was detected.
type ChainStatusReorg struct {
NewTail ChainTail
CommonAncestorBlock uint64 // Block number of common ancestor for recovery
}
// ChainStatusFinalityViolated indicates a finality violation was detected (critical error).
type ChainStatusFinalityViolated struct {
ViolatedBlock BlockHeader // The finalized block that was reorged
NewTail ChainTail // The new chain tail showing correct state
SafeRestartBlock uint64 // Last known good block to restart from
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the coordinator need type casting on the status result in order to use the response:
select {
case <- newMessage: /* normal processing things here */
case <- ctx.Done(): /* normal cancel stuff here */
case status <- reorgStatus:
switch status.(type) {
case ChainStatusReorg:
/* use ChainStatusReorg struct */
case ChainStatusFinalityViolated:
/* use ChainStatusFinalityViolated struct */
default:
panic("unknown status")
}
ChainStatusReorg and ChainStatusFinalityViolated are actually the same error, the only difference is whether finality has been violated. CommonAncestorBlock and SafeRestartBlock are the same thing. What about a single status struct and no casting:
type ReorgStatus struct {
NewTail ChainTail
CommonAncestorBlock uint64 // Block number of common ancestor for recovery
FinalityViolated bool
}
2d5b87c to
34b1b56
Compare
In the interest of making this PR just about re-org detection, does it make sense to do this isolation in another PR and then implement this re-org logic? |
Good point. I wanted to do it like you're saying initially but got carried away while doing it 😁. Basically the re-org logic was blocking all chains if one chain is being reorged and I couldn't leave it like that 😅 |
E2E Smoke Test Results
Full logs are available in the workflow artifacts. |
|
Code coverage report:
|
Add Reorg Detection Infrastructure for CCIP v1.7 Verifiers
Summary
This PR introduces blockchain reorganization detection capabilities for CCIP v1.7 verifiers to ensure safe message processing in the presence of chain reorgs and finality violations.
LCA algorithm for the reorg detection is going to be in another PR
Changes
Core Infrastructure
Chain Status Tracking (
protocol/chain_status.go):ChainTail: Data structure for tracking contiguous block headers with validationStableTip(),Tip(),Contains(),BlockByNumber()ChainStatusReorg: Event type for regular reorgs with common ancestor informationChainStatusFinalityViolated: Critical event when finalized blocks are reorgedReorg Detector (
protocol/reorg_detector.go,verifier/reorg_detector_service.go):ReorgDetectorinterface for chain-agnostic reorg monitoringReorgDetectorServiceimplementation that:SourceReader.SubscribeNewHeads()Per-Chain State Management (
verifier/verification_coordinator.go):sourceStatestruct: Encapsulates all per-chain state including:SourceReaderServiceinstanceReorgDetectorinstancependingTasks []VerificationTask)reorgInProgressatomic flagCoordinator Integration (
verifier/verification_coordinator.go):handleReorg(): Responds to regular reorgs by:reorgInProgressflag immediately (blocks new task additions)SourceReaderServiceto common ancestor block with 30s timeoutreorgInProgressflag only after reset completeshandleFinalityViolation(): Responds to finality violations by:Architecture
sequenceDiagram participant RDS as ReorgDetectorService<br/>(Chain A) participant VC as VerificationCoordinator participant SS as sourceState<br/>(Chain A) participant SRS as SourceReaderService<br/>(Chain A) participant CM as CheckpointManager RDS->>VC: ChainStatus event<br/>(reorg detected) VC->>SS: Set reorgInProgress = true VC->>SS: Lock pendingMu VC->>SS: Flush reorged tasks<br/>(block > common ancestor) VC->>SS: Unlock pendingMu VC->>SRS: ResetToBlock(commonAncestor)<br/>[BLOCKING] SRS->>CM: WriteCheckpoint(commonAncestor) (only if finality violated) Note over VC,SRS: Coordinator waits here until<br/>reader confirms reset SRS-->>VC: Reset complete VC->>SS: Set reorgInProgress = false Note over VC: Chain A ready for new tasks Note over VC: Chains B, C, D unaffectedFlow:
ReorgDetectorServicesubscribes to block headers and maintains chain tailChainStatuseventVerificationCoordinatorreceives event and invokes appropriate handler:reorgInProgress=trueflag (prevents new tasks)SourceReaderService.ResetToBlock()to complete (30s timeout)reorgInProgress=falseflagKey Design Improvements
Per-Chain Queue Isolation:
sourceStatemaintains its ownpendingTasksqueuereorgInProgressflags prevent race conditions per chainSynchronous Reset Behavior:
handleReorg()uses a deferred unlock pattern to ensure atomicityreorgInProgressflag prevents concurrent task additions during the entire reorg recoveryImplementation Status
Completed:
ChainTail,ChainStatustypes)ReorgDetector, updatedSourceReader)Deferred to Follow-up PRs:
ReorgDetectorService.Start()full implementationmonitorSubscription()algorithm (gap detection, tail updates, ancestor finding)StatePollerServicefor automatic reader restart after finality violationsNotes