Skip to content

Conversation

@sandraromanchenko
Copy link
Collaborator

@sandraromanchenko sandraromanchenko commented Nov 17, 2025

PCSM-226 Powered by Pull Request Badge

Problem

When cloning a sharded collection with an inconsistent index (an index that exists on some shards but not others), PCSM copies the inconsistent index as a regular one. This causes subsequent data copy to fail because documents that prevented index creation on one shard will also fail on the target.

Solution

Detect inconsistent indexes by using $indexStats aggregation to count index occurrences across shards. An index is considered inconsistent if it exists on fewer shards than the _id_ index. These inconsistent indexes are then skipped during the clone operation. The reason why we compare with _id_ rather agains number of shards in the cluster is that a collection doesn't need to be on all the shards all the time, depending on the chunk distribution. Also, _id_ can not be inconsistent.

We check consistency like this because checkMetadataConsistency is not present in mongodb 6.0. Also this is our one-time operation before the clone starts, so it won't impact the performance.

Also added to makefile helpers:

  • make pcsm-start SOURCE="mongodb://src-mongos:27017" TARGET="mongodb://tgt-mongos:29017" - builds PCSM and runs it with --start flag to immediately start the replication
  • make pcsm-run SOURCE="mongodb://src-mongos:27017" TARGET="mongodb://tgt-mongos:29017" - builds PCSM and runs it

Copilot AI review requested due to automatic review settings December 19, 2025 07:48
@inelpandzic inelpandzic requested a review from chupe as a code owner December 19, 2025 07:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a test case to verify that the clone operation handles data with inconsistent indexes correctly in a sharded MongoDB environment. The test creates a scenario where an index build partially fails due to incompatible document structure (an array value where the text index expects a scalar), resulting in an inconsistent index state.

  • Adds a new test file for sharded index scenarios
  • Creates a test that intentionally triggers an inconsistent index state by inserting a document with an array field after other documents, then attempting to create a compound text index
  • Verifies that the clone phase can handle collections with such inconsistent indexes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +21
try:
t.source["init_test_db"].text_collection.create_index([("a.b", 1), ("words", "text")])
assert False, "Index build should fail due array in doc for text index"
except Exception:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
t.source["init_test_db"].text_collection.create_index([("a.b", 1), ("words", "text")])
assert False, "Index build should fail due array in doc for text index"
except Exception:
pass
with pytest.raises(pymongo.errors.OperationFailure, match="text index"):
t.source["init_test_db"].text_collection.create_index([("a.b", 1), ("words", "text")])

Something like this ought to work. The assert False is... strange IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it could be the first time seeing it, but it is common to write it like that in our QA e2e tests.

golangci-lint run

pcsm-run: build
./bin/pcsm --source=$(SOURCE) --target=$(TARGET) --log-level=debug --reset-state
Copy link
Collaborator

@chupe chupe Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--reset-state is ineffectual here.
It doesn't have an effect unless starting the replication via --start

Copy link
Collaborator

@inelpandzic inelpandzic Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does, you run like this pcsm --reset-state you will run PCSM and it will reset the state before starting to accept command like start or something.

It doesn't matter if you pass --start or not, it will simply reset the state (meaning cleanup percona_clustersync_mongodb db that PCSM maintains on the target for it state) on this root pcsm command.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it, you're right 👍 Thanks!

@inelpandzic inelpandzic merged commit ce70f31 into main Dec 24, 2025
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants