[CI] refactor TestFollowerHappyPath to use synctest by peterargue · Pull Request #8474 · onflow/flow-go

peterargue · 2026-02-27T18:12:22Z

Problem

TestFollowerHappyPath in engine/common/follower flakily panics with panic: pebble: closed when run under load alongside other engine tests. The panic manifests as a process crash rather than a clean test failure, obscuring what actually went wrong.

Reproduced by running go test ./engine/....

Root Cause

This test failed with several different failure cases (pebble panic, timeouts, etc). The root cause is the test takes a long time to run in the happy path (~30+ seconds). So under load from other concurrently running tests, it can take much longer. the test was regularly timing out after 90 second. This timeout would cause the test teardown logic to skip waiting for the components to finish shutting down, resulting in the pebble panics.

Fix

Switch the test to using synctest to reduce blocking caused by synchronization. Maintain the original test's randomness to ensure it continues to perform the same test.

Summary by CodeRabbit

Tests
- Improved integration test reliability with coordinated startup/teardown orchestration and deterministic waits so goroutines settle before assertions.
- Replaced polling-based waits with event-driven finalization signaling to wait for target conditions instead of busy-wait loops.
- Simplified shutdown and cleanup with deterministic cancellation and coordinated waits for components to finish, reducing flakiness.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-02-27T18:12:36Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-02-27T18:13:15Z

📝 Walkthrough

Walkthrough

Reworks the follower integration test to use synctest orchestration: wraps setup in a synctest bubble, uses an OnBlockFinalized consumer to signal completion via a finalized channel (replacing polling), switches readiness to non-blocking AllReady + synctest.Wait, and simplifies shutdown with channel-driven synchronization.

Changes

Cohort / File(s)	Summary
Follower integration test `engine/common/follower/integration_test.go`	Wrapped test startup in a `synctest` bubble and added `runTestFollowerHappyPath`; introduced a `finalized` channel and `OnBlockFinalized` consumer to signal finalization (replaces polling); replaced require-style readiness with non-blocking `AllReady` + `synctest.Wait`; simplified shutdown to stop producers, cancel context, and wait for loops using `synctest.Wait`; added `synctest` and HotStuff imports.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Fix flaky TestFollowerHappyPath: panic on pebble DB close #8466: Adjusts teardown/wait logic in engine/common/follower/integration_test.go (related changes addressing teardown timing and flakiness).

Suggested reviewers

zhangchiqing
fxamacker

Poem

🐰
I wrapped the test in a cozy synctest shell,
a finalization bell rang, no more polling spell,
goroutines settled, shutdown did glide,
the follower hummed, content inside,
carrots for all — hop, test passed well. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: refactoring TestFollowerHappyPath to use synctest for improved test synchronization.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch peterargue/fix-follower-happy-path-pebble-panic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ower-happy-path-pebble-panic

codecov-commenter · 2026-02-27T18:17:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@engine/common/follower/integration_test.go`:
- Line 220: Replace the fatal call to unittest.RequireReturnsBefore(wg.Wait,
...) inside the defer with a non-fatal timeout pattern so the defer can still
reach e.g. AllDone; specifically, remove the unittest.RequireReturnsBefore
invocation and instead spawn a goroutine that does wg.Wait() and signals a done
channel, then in the defer use select { case <-done: /* ok */ case
<-time.After(time.Second): t.Errorf("workers did not stop producing in time") }
so the defer continues and still calls AllDone (the defer's wait/cleanup logic
that uses AllDone must remain reachable).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65d2049 and e9ea929.

📒 Files selected for processing (1)

engine/common/follower/integration_test.go

engine/common/follower/integration_test.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

engine/common/follower/integration_test.go (1)

247-257: ⚠️ Potential issue | 🟠 Major

Shutdown says workers are awaited, but wg.Wait() is never reached.

The comment says producers are waited out, but the code never waits on wg. This can leave producer goroutines still calling engine.OnSyncedBlocks while shutdown is in progress.

🔧 Proposed fix

 		// stop producers and wait for them to exit, then wait for engine shutdown.
 		submittingBlocks.Store(false)
+		workersDone := make(chan struct{})
+		go func() {
+			defer close(workersDone)
+			wg.Wait()
+		}()
 		cancel()

 		allDone := moduleutil.AllDone(engine, followerLoop)
 		synctest.Wait()
+		select {
+		case <-workersDone:
+		default:
+			t.Errorf("workers did not stop producing")
+		}
 		select {
 		case <-allDone:
 		default:
 			t.Fatal("engine failed to stop")
 		}

As per coding guidelines: "Use proper synchronization primitives to maintain state consistency, especially in concurrent or distributed contexts."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@engine/common/follower/integration_test.go` around lines 247 - 257, The
shutdown sequence stops producers by setting submittingBlocks and calling
cancel(), but it never waits for the producer WaitGroup (wg) to finish, allowing
producers to still call engine.OnSyncedBlocks during shutdown; update the
sequence in the test to call wg.Wait() (the WaitGroup that tracks producer
goroutines) after submittingBlocks.Store(false) and cancel() and before calling
synctest.Wait() and checking moduleutil.AllDone(engine, followerLoop), so
producers are fully stopped prior to asserting engine shutdown.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@engine/common/follower/integration_test.go`:
- Around line 163-167: The select that checks <-allReady currently calls
t.Fatal("engine failed to start") which aborts the test immediately and prevents
the later cancel() from running; change that to t.Errorf("engine failed to
start") followed by a controlled early return (or t.FailNow replacement with a
defer-aware exit) so the test function can reach the cancel() cleanup. Locate
the select using the allReady channel and replace the t.Fatal call with t.Errorf
plus an immediate return to preserve graceful shutdown of the mock context and
ensure cancel() is invoked.

---

Duplicate comments:
In `@engine/common/follower/integration_test.go`:
- Around line 247-257: The shutdown sequence stops producers by setting
submittingBlocks and calling cancel(), but it never waits for the producer
WaitGroup (wg) to finish, allowing producers to still call engine.OnSyncedBlocks
during shutdown; update the sequence in the test to call wg.Wait() (the
WaitGroup that tracks producer goroutines) after submittingBlocks.Store(false)
and cancel() and before calling synctest.Wait() and checking
moduleutil.AllDone(engine, followerLoop), so producers are fully stopped prior
to asserting engine shutdown.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9ea929 and 2939273.

📒 Files selected for processing (1)

engine/common/follower/integration_test.go

coderabbitai · 2026-02-28T01:51:50Z

engine/common/follower/integration_test.go

+		select {
+		case <-allReady:
+		default:
+			t.Fatal("engine failed to start")
+		}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Reading full context of integration_test.go around lines 163-167 ===" wc -l engine/common/follower/integration_test.go echo echo "=== Context around lines 155-175 (first fatal check) ===" sed -n '155,175p' engine/common/follower/integration_test.go echo echo "=== Context around lines 245-260 (second fatal check) ===" sed -n '245,260p' engine/common/follower/integration_test.go echo echo "=== Check for cleanup patterns in the test function ===" # Find the test function containing these lines sed -n '100,300p' engine/common/follower/integration_test.go | grep -n "func Test\|defer\|cancel\|wg\\.Wait\|t\\.Cleanup"

Repository: onflow/flow-go

Length of output: 1796

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Full test function structure (broader view) ===" # Find where the test function starts sed -n '1,170p' engine/common/follower/integration_test.go | head -100

Repository: onflow/flow-go

Length of output: 3742

Ensure engine lifecycle checks don't skip cancellation in failure paths.

Using t.Fatal() for start/stop checks will prevent the cancel() call at line 250, leaving the mock context improperly shut down. Replace with t.Errorf() and controlled early returns to preserve graceful shutdown semantics, especially important given this test validates the lifecycle management guarantees.

♻️ Proposed refactor

select { case <-allReady: default: - t.Fatal("engine failed to start") + t.Errorf("engine failed to start") + cancel() + return } @@ select { case <-allDone: default: - t.Fatal("engine failed to stop") + t.Errorf("engine failed to stop") }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

select {

case <-allReady:

default:

t.Fatal("engine failed to start")

}

select {

case <-allReady:

default:

t.Errorf("engine failed to start")

cancel()

return

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@engine/common/follower/integration_test.go` around lines 163 - 167, The select that checks <-allReady currently calls t.Fatal("engine failed to start") which aborts the test immediately and prevents the later cancel() from running; change that to t.Errorf("engine failed to start") followed by a controlled early return (or t.FailNow replacement with a defer-aware exit) so the test function can reach the cancel() cleanup. Locate the select using the allReady channel and replace the t.Fatal call with t.Errorf plus an immediate return to preserve graceful shutdown of the mock context and ensure cancel() is invoked.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@engine/common/follower/integration_test.go`:
- Around line 256-262: The test stops producers by setting
submittingBlocks.Store(false) but never waits for the worker goroutines to
finish, causing races when workers still call engine.OnSyncedBlocks() while the
defer cancels mockCtx/engine; after submittingBlocks.Store(false) add a
wg.Wait() to block until all workers have called wg.Done(), ensuring all
goroutines (that reference wg and call engine.OnSyncedBlocks()) complete before
the deferred shutdown of mockCtx/engine proceeds.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb5fa213-d798-416f-9047-df487e189103

📥 Commits

Reviewing files that changed from the base of the PR and between 2939273 and b2d0887.

📒 Files selected for processing (1)

engine/common/follower/integration_test.go

coderabbitai · 2026-03-04T03:15:53Z

engine/common/follower/integration_test.go

+		<-finalized

-			// stop engines and wait for graceful shutdown
-			cancel()
-			unittest.RequireCloseBefore(t, moduleutil.AllDone(engine, followerLoop), 10*time.Second, "engine failed to stop")
-			// Note: in case any error occur, the `mockCtx` will fail the test, due to the unexpected call of `Throw` on the mock.
-		}()
-
-		// wait for target block to become finalized, this might take a while.
-		require.Eventually(t, func() bool {
-			final, err := followerState.Final().Head()
-			require.NoError(t, err)
-			success := final.Height == targetBlockHeight
-			if !success {
-				t.Logf("finalized height %d, waiting for %d", final.Height, targetBlockHeight)
-			} else {
-				t.Logf("successfully finalized target height %d\n", targetBlockHeight)
-			}
-			return success
-		}, 90*time.Second, time.Second, "expect to process all blocks before timeout")
+		// stop producers and wait for them to exit, then wait for engine shutdown.
+		submittingBlocks.Store(false)

+		// Note: in case any error occur, the `mockCtx` will fail the test, due to the unexpected call of `Throw` on the mock.
+		// shutdown is in defer


⚠️ Potential issue | 🟠 Major

Missing synchronization: worker goroutines are never waited on before shutdown.

The wg WaitGroup is created and used with Add/Done, but wg.Wait() is never called. After submittingBlocks.Store(false), the test immediately exits, triggering the defer which cancels the context and shuts down the engine. Workers may still be in the middle of calling engine.OnSyncedBlocks() on a shutting-down engine, creating a race condition.

🔧 Proposed fix: wait for workers before proceeding to shutdown

// Block until the target block is finalized. The callback above fires from within the // HotStuff finalization goroutine (inside the bubble) and sends on this channel, which // unblocks the main goroutine. <-finalized // stop producers and wait for them to exit, then wait for engine shutdown. submittingBlocks.Store(false) + wg.Wait() // Note: in case any error occur, the `mockCtx` will fail the test, due to the unexpected call of `Throw` on the mock. // shutdown is in defer

Based on learnings: "Use proper synchronization primitives to maintain state consistency, especially in concurrent or distributed contexts."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<-finalized

// stop engines and wait for graceful shutdown

cancel()

unittest.RequireCloseBefore(t, moduleutil.AllDone(engine, followerLoop), 10*time.Second, "engine failed to stop")

// Note: in case any error occur, the `mockCtx` will fail the test, due to the unexpected call of `Throw` on the mock.

}()

// wait for target block to become finalized, this might take a while.

require.Eventually(t, func() bool {

final, err := followerState.Final().Head()

require.NoError(t, err)

success := final.Height == targetBlockHeight

if !success {

t.Logf("finalized height %d, waiting for %d", final.Height, targetBlockHeight)

} else {

t.Logf("successfully finalized target height %d\n", targetBlockHeight)

}

return success

}, 90*time.Second, time.Second, "expect to process all blocks before timeout")

// stop producers and wait for them to exit, then wait for engine shutdown.

submittingBlocks.Store(false)

// Note: in case any error occur, the `mockCtx` will fail the test, due to the unexpected call of `Throw` on the mock.

// shutdown is in defer

<-finalized

// stop producers and wait for them to exit, then wait for engine shutdown.

submittingBlocks.Store(false)

wg.Wait()

// Note: in case any error occur, the `mockCtx` will fail the test, due to the unexpected call of `Throw` on the mock.

// shutdown is in defer

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@engine/common/follower/integration_test.go` around lines 256 - 262, The test stops producers by setting submittingBlocks.Store(false) but never waits for the worker goroutines to finish, causing races when workers still call engine.OnSyncedBlocks() while the defer cancels mockCtx/engine; after submittingBlocks.Store(false) add a wg.Wait() to block until all workers have called wg.Done(), ensuring all goroutines (that reference wg and call engine.OnSyncedBlocks()) complete before the deferred shutdown of mockCtx/engine proceeds.

fix: prevent pebble:closed panic in TestFollowerHappyPath shutdown

e9ea929

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

peterargue requested a review from a team as a code owner February 27, 2026 18:12

peterargue added Bugfix Tech Debt labels Feb 27, 2026

Merge remote-tracking branch 'origin/master' into peterargue/fix-foll…

2a5028b

…ower-happy-path-pebble-panic

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

engine/common/follower/integration_test.go Outdated Show resolved Hide resolved

switch to using synctest

2939273

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

peterargue changed the title ~~fix: prevent pebble:closed panic in TestFollowerHappyPath shutdown~~ [CI] refactor TestFollowerHappyPath to use synctest Mar 2, 2026

move shutdown into defer to guarantee close

b2d0887

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] refactor TestFollowerHappyPath to use synctest#8474

[CI] refactor TestFollowerHappyPath to use synctest#8474
peterargue wants to merge 4 commits intomasterfrom
peterargue/fix-follower-happy-path-pebble-panic

peterargue commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peterargue commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Summary by CodeRabbit

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Feb 27, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peterargue commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading