Skip to content

PCSM-249: Handle movePrimary phantom events in replication#163

Merged
chupe merged 10 commits intomainfrom
PCSM-249
Mar 9, 2026
Merged

PCSM-249: Handle movePrimary phantom events in replication#163
chupe merged 10 commits intomainfrom
PCSM-249

Conversation

@sandraromanchenko
Copy link
Copy Markdown
Collaborator

@sandraromanchenko sandraromanchenko commented Dec 17, 2025

PCSM-249

Problem

MongoDB's movePrimary command doesn't emit a dedicated change stream event. Instead, it produces phantom create events for collections that were moved between shards. Without detection, PCSM recreates these collections on the target, resulting in data loss — the target ends up with zero documents for the affected collections.

Solution

When handling a Create change event, check the in-memory catalog first. If PCSM already tracks the collection from a prior clone or replicate, the create is phantom — only the UUID is updated via SetCollectionUUID to reflect the post-move value. Otherwise it's a real create and proceeds normally.

This works because the catalog is an authoritative record of what PCSM has already replicated to the target. A phantom create for a collection the catalog knows about means movePrimary, while a create for an unknown collection means the user actually created it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a test to verify data consistency after executing the MongoDB movePrimary command, which is used to change the primary shard for an unsharded database. The test ensures that the PCSM (Percona ClusterSync for MongoDB) tool correctly handles and replicates the primary shard changes from source to target clusters.

Key changes:

  • Adds a new test file test_shard_ops.py for testing shard-related operations
  • Implements test_move_primary to verify the movePrimary command behavior
  • Includes error handling for environments where movePrimary is not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_shard_ops.py Outdated
Comment thread tests/test_shard_ops.py Outdated
Comment thread tests/test_shard_ops.py Outdated
Comment thread tests/test_shard_ops.py Outdated
Comment thread tests/test_shard_ops.py Outdated
Comment thread tests/test_shard_ops.py Outdated
@chupe chupe self-requested a review February 18, 2026 16:02
@chupe chupe marked this pull request as draft February 18, 2026 20:39
@chupe chupe changed the title PCSM-249. Data mismatch after movePrimary command PCSM-249: Handle movePrimary phantom events in replication Mar 4, 2026
chupe and others added 6 commits March 4, 2026 19:48
The previous approach checked source-side collection existence for both Drop and Create phantom events from movePrimary. This worked for Drop (if the collection still exists on source, the drop is phantom) but was fundamentally broken for Create — collections always exist on source after a real create, so every legitimate create was misidentified as a movePrimary artifact and skipped.

Split the detection strategy: Drop events continue checking the source, while Create events now check the in-memory catalog. If the catalog already tracks the collection (meaning PCSM previously cloned or replicated it to the target), the create is phantom and only the UUID is updated. Otherwise it's a real create and proceeds normally.
@chupe chupe marked this pull request as ready for review March 4, 2026 19:08
movePrimary via mongos change stream only emits create events, never
drop events. The source-side existence check for drops was incorrectly
skipping real collection drops when the collection had been quickly
re-created, causing data mismatch in test_csync_PML_T10.
Comment thread tests/test_shard_ops.py Outdated
@chupe chupe merged commit 4971040 into main Mar 9, 2026
33 checks passed
chupe added a commit that referenced this pull request Mar 23, 2026
On MongoDB 6.0/7.0, movePrimary emits create→drop through the change
stream (confirmed empirically). PR #163 handled the phantom create but
the subsequent drop was replayed, wiping the target collection.

On 8.0 only a phantom create is emitted — no drop follows.

Track namespaces where a phantom create was detected in a version-gated
map (nil on 8.0+). When a drop arrives for a tracked namespace, skip it
instead of replaying on the target.
chupe added a commit that referenced this pull request Mar 23, 2026
On MongoDB 6.0/7.0, movePrimary emits create then drop through the
mongos change stream. PR #163 handled the phantom create but the
subsequent drop was replayed, wiping the target collection.

Track namespaces where a phantom create was detected. When a drop
arrives for a tracked namespace, skip it. Gated on source <8.0 AND
sharded topology (IsMongos check) to avoid false positives on RS where
create-for-existing-collection is a legitimate recreate, not movePrimary.

Also enables sharded E2E tests on 6.0/7.0 in CI (scoped to sharded-
specific test files, full suite still blocked by PCSM-255).
@chupe chupe deleted the PCSM-249 branch March 27, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants