feat: harden add-node workflow for data safety, add failover test#57
Conversation
- Add PeerCatchup resource: waits for the source to actually apply peer commits (not just receive WAL) before the source->new COPY starts. Closes a data-loss window during add-node. - Add ReplicationOriginAdvance resource: keeps the subscriber-side origin in lockstep with the provider-side slot, so the apply worker resumes from the right LSN instead of replaying WAL from 0/0. - ReplicationSlotAdvanceFromCTS now records the LSN it advanced to (empty when skipped) so origin advance can no-op cleanly. - WaitForSyncEvent treats disabled/down subscription states as transient with backoff polling instead of failing fast. - Wire the new resources into addPopulateResources
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR introduces two new Spock resource types to refine replication setup gating: ChangesSpock Replication Resources and Orchestration
Integration Testing Infrastructure and Failover Validation
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Up to standards ✅🟢 Issues
|
| Category | Results |
|---|---|
| Complexity | 2 medium |
🟢 Metrics 108 complexity · 47 duplication
Metric Results Complexity 108 Duplication 47
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/spock/replication_slot_advance.go`:
- Around line 76-78: Don't clear r.AdvancedToLSN on entry; instead ensure
r.AdvancedToLSN is set to the effective slot LSN on every successful exit
(including idempotent/no-op and retry paths) so the origin advance step sees the
moved slot. Concretely: remove or stop resetting r.AdvancedToLSN = "" at the
start of the reconciliation and, after determining the provider slot LSN (the
value you already read when checking commit_ts/current slot or after calling
pg_replication_slot_advance), assign that LSN into r.AdvancedToLSN before
returning from all success branches (including the paths where
pg_replication_slot_advance was a no-op or when ReplicationOriginAdvance failed
but the slot already moved). Apply the same change for the second similar block
(the code around the later 113-149 region) so both idempotent and retry exits
carry forward the effective LSN.
- Around line 113-126: The current code compares WAL positions using Go string
ordering (`if targetLSN <= currentLSN`) which is wrong; replace that in the
block after reading currentLSN by performing a pg_lsn-aware SQL comparison via
r.conn.QueryRow instead of lexicographic comparison: call QueryRow with a
statement like "SELECT $1::pg_lsn <= $2::pg_lsn" (bind targetLSN and
currentLSN), Scan the result into a bool (e.g., alreadyAtOrBeyond) and use that
bool to decide to log and return; keep the existing variables currentLSN and
targetLSN and the surrounding error handling.
In `@internal/spock/spock_test.go`:
- Around line 529-535: The current assertion only checks d.Type ==
ResourceTypeReplicationOriginAdvance and risks matching the wrong edge; update
the loop in the test to assert both d.Type ==
ResourceTypeReplicationOriginAdvance and d.ID == "n2_n3" (use the existing
variable names foundAdvance and d.ID) so the test specifically verifies the
replication origin advance edge for the expected resource ID "n2_n3".
In `@test/integration/replication_helpers_test.go`:
- Around line 76-97: The shared ctx/cancel used by all subtests can expire
mid-run causing cascading flakiness; inside the t.Run closure create a fresh
per-subtest timeout context (e.g., subCtx, subCancel :=
context.WithTimeout(context.Background(), 60*time.Second)) and use that subCtx
when calling wait.Until and testKube.ExecSQL, then defer subCancel in the
subtest to avoid leaks; keep the existing t.Run, wait.Until, and
testKube.ExecSQL calls but replace references to the outer ctx with the new
per-subtest context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a7489300-23ae-43bd-ad06-e5dffbf86ce7
📒 Files selected for processing (12)
internal/spock/desired.gointernal/spock/peer_catchup.gointernal/spock/replication_origin_advance.gointernal/spock/replication_slot_advance.gointernal/spock/spock_test.gointernal/spock/wait_for_sync_event.gotest/Makefiletest/integration/failover_helpers.gotest/integration/failover_test.gotest/integration/nodes_test.gotest/integration/replication_helpers_test.gotest/integration/testdata/distributed-2node-2instance-values.yaml
tsivaprasad
left a comment
There was a problem hiding this comment.
Looks good and aligns well with the Control Plane and Spock 5.0.8 ZODAN workflow.
Verification:
make test-failover
/Library/Developer/CommandLineTools/usr/bin/make -C /Users/sivat/projects/pgedge-helm/pgedge-helm docker-build-dev
docker buildx bake dev
[+] Building 1.3s (15/15) FINISHED docker:desktop-linux
=> [internal] load local bake definitions 0.0s
=> => reading docker-bake.hcl 1.12kB / 1.12kB 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 514B 0.0s
=> [internal] load metadata for docker.io/library/golang:1.25 1.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [builder 1/7] FROM docker.io/library/golang:1.25@sha256:cd05a378aaf011e8056745363e5c40f4f2bef0fa4d9bf19b9c 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 1.84kB 0.0s
=> CACHED [builder 2/7] WORKDIR /build 0.0s
=> CACHED [builder 3/7] COPY go.mod go.sum ./ 0.0s
=> CACHED [builder 4/7] RUN go mod download 0.0s
=> CACHED [builder 5/7] COPY cmd/ cmd/ 0.0s
=> CACHED [builder 6/7] COPY internal/ internal/ 0.0s
=> CACHED [builder 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -o /init-spock ./cmd/init-spock 0.0s
=> CACHED [stage-1 1/2] COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ 0.0s
=> CACHED [stage-1 2/2] COPY --from=builder /init-spock /init-spock 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:719e1c984d56467df2ab7636028678b67a56faf2c606575af9dcfadce5bcf178 0.0s
=> => naming to docker.io/library/pgedge-helm-utils:dev 0.0s
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/lnbyyap8d6l8spmtox5h7nia6
kind load docker-image pgedge-helm-utils:dev --name pgedge-test
Image: "pgedge-helm-utils:dev" with ID "sha256:719e1c984d56467df2ab7636028678b67a56faf2c606575af9dcfadce5bcf178" found to be already present on all nodes.
cd /Users/sivat/projects/pgedge-helm/pgedge-helm && go test -tags integration -v -timeout 30m \
-run "TestUnplannedFailover" ./test/integration/...
=== RUN TestUnplannedFailover
failover_test.go:61: standby pgedge-n1-2 has logical spk_* slot synced
failover_test.go:61: standby pgedge-n2-2 has logical spk_* slot synced
failover_test.go:64: force-deleted primary pod pgedge-n1-1
failover_test.go:65: new primary elected: pgedge-n1-2
=== RUN TestUnplannedFailover/no_data_loss
=== RUN TestUnplannedFailover/forward_replication
=== RUN TestUnplannedFailover/reverse_replication
=== RUN TestUnplannedFailover/subscriptions_healthy
=== RUN TestUnplannedFailover/subscriptions_healthy/n1
=== RUN TestUnplannedFailover/subscriptions_healthy/n2
=== RUN TestUnplannedFailover/slots_active
=== RUN TestUnplannedFailover/slots_active/n1
=== RUN TestUnplannedFailover/slots_active/n2
=== RUN TestUnplannedFailover/full_mesh_reoplication
=== RUN TestUnplannedFailover/full_mesh_reoplication/pgedge-n1-2_to_pgedge-n2-1
=== RUN TestUnplannedFailover/full_mesh_reoplication/pgedge-n2-1_to_pgedge-n1-2
--- PASS: TestUnplannedFailover (217.13s)
--- PASS: TestUnplannedFailover/no_data_loss (0.07s)
--- PASS: TestUnplannedFailover/forward_replication (0.18s)
--- PASS: TestUnplannedFailover/reverse_replication (4.18s)
--- PASS: TestUnplannedFailover/subscriptions_healthy (0.18s)
--- PASS: TestUnplannedFailover/subscriptions_healthy/n1 (0.09s)
--- PASS: TestUnplannedFailover/subscriptions_healthy/n2 (0.08s)
--- PASS: TestUnplannedFailover/slots_active (0.18s)
--- PASS: TestUnplannedFailover/slots_active/n1 (0.09s)
--- PASS: TestUnplannedFailover/slots_active/n2 (0.09s)
--- PASS: TestUnplannedFailover/full_mesh_reoplication (0.35s)
--- PASS: TestUnplannedFailover/full_mesh_reoplication/pgedge-n1-2_to_pgedge-n2-1 (0.08s)
--- PASS: TestUnplannedFailover/full_mesh_reoplication/pgedge-n2-1_to_pgedge-n1-2 (0.09s)
PASS
ok github.com/pgEdge/pgedge-helm/test/integration 217.632s
test git:(feat/PLAT-616/add-node-hardening) make test-nodes
/Library/Developer/CommandLineTools/usr/bin/make -C /Users/sivat/projects/pgedge-helm/pgedge-helm docker-build-dev
docker buildx bake dev
[+] Building 1.6s (15/15) FINISHED docker:desktop-linux
=> [internal] load local bake definitions 0.0s
=> => reading docker-bake.hcl 1.12kB / 1.12kB 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 514B 0.0s
=> [internal] load metadata for docker.io/library/golang:1.25 1.3s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [builder 1/7] FROM docker.io/library/golang:1.25@sha256:cd05a378aaf011e8056745363e5c40f4f2bef0fa4d9bf19b9c 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 1.84kB 0.0s
=> CACHED [builder 2/7] WORKDIR /build 0.0s
=> CACHED [builder 3/7] COPY go.mod go.sum ./ 0.0s
=> CACHED [builder 4/7] RUN go mod download 0.0s
=> CACHED [builder 5/7] COPY cmd/ cmd/ 0.0s
=> CACHED [builder 6/7] COPY internal/ internal/ 0.0s
=> CACHED [builder 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -o /init-spock ./cmd/init-spock 0.0s
=> CACHED [stage-1 1/2] COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ 0.0s
=> CACHED [stage-1 2/2] COPY --from=builder /init-spock /init-spock 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:719e1c984d56467df2ab7636028678b67a56faf2c606575af9dcfadce5bcf178 0.0s
=> => naming to docker.io/library/pgedge-helm-utils:dev 0.0s
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/t5g7v04tsflry15gtzjqh7q17
kind load docker-image pgedge-helm-utils:dev --name pgedge-test
Image: "pgedge-helm-utils:dev" with ID "sha256:719e1c984d56467df2ab7636028678b67a56faf2c606575af9dcfadce5bcf178" found to be already present on all nodes.
cd /Users/sivat/projects/pgedge-helm/pgedge-helm && go test -tags integration -v -timeout 30m \
-run "TestNodes" ./test/integration/...
=== RUN TestNodesAddNode
=== RUN TestNodesAddNode/upgrade_rejects_new_node_without_bootstrap_mode
=== RUN TestNodesAddNode/upgrade_rejects_rebootstrap_existing_node
=== RUN TestNodesAddNode/n3_cluster_healthy
=== RUN TestNodesAddNode/init_spock_succeeds_after_upgrade
=== RUN TestNodesAddNode/n3_has_existing_data
=== RUN TestNodesAddNode/full_mesh_replication
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n2-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n3-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n1-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n3-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n1-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n2-1
=== RUN TestNodesAddNode/idempotent_rerun_on_3_nodes
--- PASS: TestNodesAddNode (66.96s)
--- PASS: TestNodesAddNode/upgrade_rejects_new_node_without_bootstrap_mode (0.12s)
--- PASS: TestNodesAddNode/upgrade_rejects_rebootstrap_existing_node (0.06s)
--- PASS: TestNodesAddNode/n3_cluster_healthy (0.38s)
--- PASS: TestNodesAddNode/init_spock_succeeds_after_upgrade (0.15s)
--- PASS: TestNodesAddNode/n3_has_existing_data (0.14s)
--- PASS: TestNodesAddNode/full_mesh_replication (0.81s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n2-1 (0.09s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n3-1 (0.09s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n1-1 (0.08s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n3-1 (0.10s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n1-1 (0.08s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n2-1 (0.09s)
--- PASS: TestNodesAddNode/idempotent_rerun_on_3_nodes (4.22s)
=== RUN TestNodesAddNodeZeroDowntime
=== RUN TestNodesAddNodeZeroDowntime/n3_cluster_healthy
=== RUN TestNodesAddNodeZeroDowntime/init_spock_succeeds
=== RUN TestNodesAddNodeZeroDowntime/origin_advanced_on_n3
=== RUN TestNodesAddNodeZeroDowntime/n3_has_all_data
=== RUN TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n1
=== RUN TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n2
=== RUN TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally
=== RUN TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n1-1
=== RUN TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n2-1
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n1-1_subscriptions
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n2-1_subscriptions
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n3-1_subscriptions
--- PASS: TestNodesAddNodeZeroDowntime (68.31s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_cluster_healthy (0.14s)
--- PASS: TestNodesAddNodeZeroDowntime/init_spock_succeeds (0.16s)
--- PASS: TestNodesAddNodeZeroDowntime/origin_advanced_on_n3 (0.08s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_has_all_data (0.52s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n1 (0.26s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n2 (0.25s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally (1.14s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n1-1 (0.09s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n2-1 (0.08s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established (0.33s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n1-1_subscriptions (0.10s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n2-1_subscriptions (0.11s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n3-1_subscriptions (0.13s)
=== RUN TestNodesRemoveNode
=== RUN TestNodesRemoveNode/remaining_clusters_healthy
=== RUN TestNodesRemoveNode/init_spock_succeeds_after_removal
=== RUN TestNodesRemoveNode/spock_node_n3_removed
=== RUN TestNodesRemoveNode/spock_node_n3_removed/pgedge-n1-1
=== RUN TestNodesRemoveNode/spock_node_n3_removed/pgedge-n2-1
=== RUN TestNodesRemoveNode/subscriptions_to_n3_removed
=== RUN TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n1-1
=== RUN TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n2-1
=== RUN TestNodesRemoveNode/replication_still_works
--- PASS: TestNodesRemoveNode (39.43s)
--- PASS: TestNodesRemoveNode/remaining_clusters_healthy (0.31s)
--- PASS: TestNodesRemoveNode/init_spock_succeeds_after_removal (0.15s)
--- PASS: TestNodesRemoveNode/spock_node_n3_removed (0.20s)
--- PASS: TestNodesRemoveNode/spock_node_n3_removed/pgedge-n1-1 (0.11s)
--- PASS: TestNodesRemoveNode/spock_node_n3_removed/pgedge-n2-1 (0.09s)
--- PASS: TestNodesRemoveNode/subscriptions_to_n3_removed (0.16s)
--- PASS: TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n1-1 (0.09s)
--- PASS: TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n2-1 (0.08s)
--- PASS: TestNodesRemoveNode/replication_still_works (0.25s)
PASS
ok github.com/pgEdge/pgedge-helm/test/integration 175.069s
test git:(feat/PLAT-616/add-node-hardening) make test-run RUN="TestUnplannedFailover|TestNodes"
/Library/Developer/CommandLineTools/usr/bin/make -C /Users/sivat/projects/pgedge-helm/pgedge-helm docker-build-dev
docker buildx bake dev
[+] Building 1.2s (15/15) FINISHED docker:desktop-linux
=> [internal] load local bake definitions 0.0s
=> => reading docker-bake.hcl 1.12kB / 1.12kB 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 514B 0.0s
=> [internal] load metadata for docker.io/library/golang:1.25 1.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [builder 1/7] FROM docker.io/library/golang:1.25@sha256:cd05a378aaf011e8056745363e5c40f4f2bef0fa4d9bf19b9c 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 1.84kB 0.0s
=> CACHED [builder 2/7] WORKDIR /build 0.0s
=> CACHED [builder 3/7] COPY go.mod go.sum ./ 0.0s
=> CACHED [builder 4/7] RUN go mod download 0.0s
=> CACHED [builder 5/7] COPY cmd/ cmd/ 0.0s
=> CACHED [builder 6/7] COPY internal/ internal/ 0.0s
=> CACHED [builder 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -o /init-spock ./cmd/init-spock 0.0s
=> CACHED [stage-1 1/2] COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ 0.0s
=> CACHED [stage-1 2/2] COPY --from=builder /init-spock /init-spock 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:719e1c984d56467df2ab7636028678b67a56faf2c606575af9dcfadce5bcf178 0.0s
=> => naming to docker.io/library/pgedge-helm-utils:dev 0.0s
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/ymrr9e909c9q02nase1ljlh2m
kind load docker-image pgedge-helm-utils:dev --name pgedge-test
Image: "pgedge-helm-utils:dev" with ID "sha256:719e1c984d56467df2ab7636028678b67a56faf2c606575af9dcfadce5bcf178" found to be already present on all nodes.
cd /Users/sivat/projects/pgedge-helm/pgedge-helm && go test -tags integration -v -timeout 30m \
-run "TestUnplannedFailover|TestNodes" ./test/integration/...
=== RUN TestUnplannedFailover
failover_test.go:61: standby pgedge-n1-2 has logical spk_* slot synced
failover_test.go:61: standby pgedge-n2-2 has logical spk_* slot synced
failover_test.go:64: force-deleted primary pod pgedge-n1-1
failover_test.go:65: new primary elected: pgedge-n1-2
=== RUN TestUnplannedFailover/no_data_loss
=== RUN TestUnplannedFailover/forward_replication
=== RUN TestUnplannedFailover/reverse_replication
=== RUN TestUnplannedFailover/subscriptions_healthy
=== RUN TestUnplannedFailover/subscriptions_healthy/n1
=== RUN TestUnplannedFailover/subscriptions_healthy/n2
=== RUN TestUnplannedFailover/slots_active
=== RUN TestUnplannedFailover/slots_active/n1
=== RUN TestUnplannedFailover/slots_active/n2
=== RUN TestUnplannedFailover/full_mesh_reoplication
=== RUN TestUnplannedFailover/full_mesh_reoplication/pgedge-n1-2_to_pgedge-n2-1
=== RUN TestUnplannedFailover/full_mesh_reoplication/pgedge-n2-1_to_pgedge-n1-2
--- PASS: TestUnplannedFailover (115.01s)
--- PASS: TestUnplannedFailover/no_data_loss (0.09s)
--- PASS: TestUnplannedFailover/forward_replication (0.17s)
--- PASS: TestUnplannedFailover/reverse_replication (8.23s)
--- PASS: TestUnplannedFailover/subscriptions_healthy (0.18s)
--- PASS: TestUnplannedFailover/subscriptions_healthy/n1 (0.09s)
--- PASS: TestUnplannedFailover/subscriptions_healthy/n2 (0.08s)
--- PASS: TestUnplannedFailover/slots_active (0.16s)
--- PASS: TestUnplannedFailover/slots_active/n1 (0.08s)
--- PASS: TestUnplannedFailover/slots_active/n2 (0.08s)
--- PASS: TestUnplannedFailover/full_mesh_reoplication (0.35s)
--- PASS: TestUnplannedFailover/full_mesh_reoplication/pgedge-n1-2_to_pgedge-n2-1 (0.09s)
--- PASS: TestUnplannedFailover/full_mesh_reoplication/pgedge-n2-1_to_pgedge-n1-2 (0.09s)
=== RUN TestNodesAddNode
=== RUN TestNodesAddNode/upgrade_rejects_new_node_without_bootstrap_mode
=== RUN TestNodesAddNode/upgrade_rejects_rebootstrap_existing_node
=== RUN TestNodesAddNode/n3_cluster_healthy
=== RUN TestNodesAddNode/init_spock_succeeds_after_upgrade
=== RUN TestNodesAddNode/n3_has_existing_data
=== RUN TestNodesAddNode/full_mesh_replication
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n1-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n2-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n2-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n3-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n1-1
=== RUN TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n3-1
=== RUN TestNodesAddNode/idempotent_rerun_on_3_nodes
--- PASS: TestNodesAddNode (70.09s)
--- PASS: TestNodesAddNode/upgrade_rejects_new_node_without_bootstrap_mode (0.07s)
--- PASS: TestNodesAddNode/upgrade_rejects_rebootstrap_existing_node (0.06s)
--- PASS: TestNodesAddNode/n3_cluster_healthy (0.44s)
--- PASS: TestNodesAddNode/init_spock_succeeds_after_upgrade (0.15s)
--- PASS: TestNodesAddNode/n3_has_existing_data (0.12s)
--- PASS: TestNodesAddNode/full_mesh_replication (0.83s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n1-1 (0.10s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n3-1_to_pgedge-n2-1 (0.09s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n2-1 (0.10s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n1-1_to_pgedge-n3-1 (0.09s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n1-1 (0.08s)
--- PASS: TestNodesAddNode/full_mesh_replication/pgedge-n2-1_to_pgedge-n3-1 (0.09s)
--- PASS: TestNodesAddNode/idempotent_rerun_on_3_nodes (4.14s)
=== RUN TestNodesAddNodeZeroDowntime
=== RUN TestNodesAddNodeZeroDowntime/n3_cluster_healthy
=== RUN TestNodesAddNodeZeroDowntime/init_spock_succeeds
=== RUN TestNodesAddNodeZeroDowntime/origin_advanced_on_n3
=== RUN TestNodesAddNodeZeroDowntime/n3_has_all_data
=== RUN TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n1
=== RUN TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n2
=== RUN TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally
=== RUN TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n1-1
=== RUN TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n2-1
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n1-1_subscriptions
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n2-1_subscriptions
=== RUN TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n3-1_subscriptions
--- PASS: TestNodesAddNodeZeroDowntime (71.41s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_cluster_healthy (0.16s)
--- PASS: TestNodesAddNodeZeroDowntime/init_spock_succeeds (0.18s)
--- PASS: TestNodesAddNodeZeroDowntime/origin_advanced_on_n3 (0.08s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_has_all_data (0.50s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n1 (0.26s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_has_all_data/test_zdt_n2 (0.24s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally (1.14s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n1-1 (0.09s)
--- PASS: TestNodesAddNodeZeroDowntime/n3_replicates_bidirectionally/pgedge-n2-1 (0.09s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established (0.26s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n1-1_subscriptions (0.08s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n2-1_subscriptions (0.09s)
--- PASS: TestNodesAddNodeZeroDowntime/full_mesh_established/pgedge-n3-1_subscriptions (0.09s)
=== RUN TestNodesRemoveNode
=== RUN TestNodesRemoveNode/remaining_clusters_healthy
=== RUN TestNodesRemoveNode/init_spock_succeeds_after_removal
=== RUN TestNodesRemoveNode/spock_node_n3_removed
=== RUN TestNodesRemoveNode/spock_node_n3_removed/pgedge-n1-1
=== RUN TestNodesRemoveNode/spock_node_n3_removed/pgedge-n2-1
=== RUN TestNodesRemoveNode/subscriptions_to_n3_removed
=== RUN TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n1-1
=== RUN TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n2-1
=== RUN TestNodesRemoveNode/replication_still_works
--- PASS: TestNodesRemoveNode (41.38s)
--- PASS: TestNodesRemoveNode/remaining_clusters_healthy (0.33s)
--- PASS: TestNodesRemoveNode/init_spock_succeeds_after_removal (0.14s)
--- PASS: TestNodesRemoveNode/spock_node_n3_removed (0.19s)
--- PASS: TestNodesRemoveNode/spock_node_n3_removed/pgedge-n1-1 (0.10s)
--- PASS: TestNodesRemoveNode/spock_node_n3_removed/pgedge-n2-1 (0.09s)
--- PASS: TestNodesRemoveNode/subscriptions_to_n3_removed (0.17s)
--- PASS: TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n1-1 (0.08s)
--- PASS: TestNodesRemoveNode/subscriptions_to_n3_removed/pgedge-n2-1 (0.09s)
--- PASS: TestNodesRemoveNode/replication_still_works (0.27s)
PASS
ok github.com/pgEdge/pgedge-helm/test/integration 298.613s
This PR hardens the zero-downtime add-node populate flow with
PeerCatchup,ReplicationOriginAdvance, and aWaitForSyncEventfix that aligns with recent updates in the Control Plane project: pgEdge/control-plane#385.In addition, it adds
TestUnplannedFailoverand anorigin_advanced_on_n3assertion to the existing zero-downtime add-node test to improve integration test coverage.Each commit can be reviewed independently.