Skip to content

DOC-12484 XDCR Conflict Logging feature #3806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: release/8.0
Choose a base branch
from

Conversation

rao-shwe
Copy link
Contributor

@rao-shwe rao-shwe commented May 9, 2025

DOC-12484

Link to the preview doc: https://preview.docs-test.couchbase.com/DOC-12484/server/current/learn/clusters-and-availability/xdcr-conflict-logging-feature.html

Preview pages:

PR pages:
New page: XDCR Conflict Logging.

Updated the following pages for "XDCR Conflict Logging":

Don't review the following files: The following are 7.6.6 release docs which were missing in the release/8.0 branch.


. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#xdcr-conflict-detection[*Conflict Detection*]: During the replication, XDCR detects true conflicts by comparing the Hybrid Logical Vector (HLV) metadata of the source and target documents.

. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#conflict-logging-process[*Conflict Logging*]: When a true conflict is detected, XDCR logs the conflict details, such as document ID, document contents, and conflicting document histories, into the designated conflict collection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define what a true conflict is?


. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#conflict-logging-process[*Conflict Logging*]: When a true conflict is detected, XDCR logs the conflict details, such as document ID, document contents, and conflicting document histories, into the designated conflict collection.

. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#conflict-access-and-management[*Conflict Access and Management*]: Administrators can access and review the logged conflicts. Then manually resolve the conflicts by selecting the appropriate mutations for replication and upsert the documents.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's not just Administrators. Technically it's any user who has RW access to the bucket can view and access.

@@ -28,6 +28,14 @@ curl -v -X POST -u [admin]:[password]
-d fromBucket=[bucket-name]
-d toCluster=[cluster-name]
-d toBucket=[bucket-name]
-d conflictLogging='{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be URL-encoded. Using -d followed by a string encoded JSON will throw an error.

One can use --data-urlencode like https://couchbase.slack.com/archives/C0963TSUU0N/p1752763203434269.

Or another option is to follow how colMappingRules [JSON-Document] is mentioned below (eg: conflictLogging [JSON-Document]) and explain the format of JSON document as explained now.

@@ -28,6 +28,14 @@ curl -v -X POST -u [admin]:[password]
-d fromBucket=[bucket-name]
-d toCluster=[cluster-name]
-d toBucket=[bucket-name]
-d conflictLogging='{
"disabled": [true | false], "bucket": [conflict-bucket-name], "collection": [conflict-scope-name].[conflict-collection-name], "loggingRules": {
[custom-conflict-scope-name]: {
Copy link

@sumukhbhat2701 sumukhbhat2701 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the LHS is source collection of the replication.
[custom-conflict-scope-name]: { -> [source-scope-name]: {

the LHS can also be [source-collection-name]

the RHS can also be {} or null

----

The `type` value must be `xmem`; which is sometimes referred to as *Version 2*, and corresponds to the _Memcached Binary_ protocol, used in XDCR communications.

The `replicationType` value is always `continuous`.
This value must be specified.

The `conflictLogging` flag enables or disables conflict logging for the replication.
When enabled (`disabled=false`), you can specify the target bucket, scope, and collection for logging conflicts, as well as custom logging rules for specific collections.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as custom logging rules for specific collections -> source collections of the replication

----

The `type` value must be `xmem`; which is sometimes referred to as *Version 2*, and corresponds to the _Memcached Binary_ protocol, used in XDCR communications.

The `replicationType` value is always `continuous`.
This value must be specified.

The `conflictLogging` flag enables or disables conflict logging for the replication.
When enabled (`disabled=false`), you can specify the target bucket, scope, and collection for logging conflicts, as well as custom logging rules for specific collections.
This helps track and resolve document conflicts during replication.
Copy link

@sumukhbhat2701 sumukhbhat2701 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention the words

  1. conflicts -> true conflicts
  2. manually if needed - resolve document conflicts manually

| `conflictLogging`
| disabled (true/false)
| Configuration settings for conflict logging. This configuration setting defines objects/parameters and options used to control how conflicts are logged within the application.
It includes settings such as log levels, output destinations, and thresholds for logging conflict events.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but what does log levels and thresholds mean?

Copy link

@sumukhbhat2701 sumukhbhat2701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some review feedback which hold true for all pages:

  1. For this feature specifically, we need to use the term "true conflicts" more than just mentioning "conflicts". That means we need to first define what a true conflict is and set the expectation.
  2. There should be a warning that this feature is best effort (and that true conflicts is assumed to be very low). Everything that's in this slide - https://couchbase.slack.com/archives/C0963TSUU0N/p1752763776316649.
  3. The setting is quite complex to understand just from textual description. An example will do a lot of help to someone new reading this.
  4. There should be a mention that on every true conflict detected, XDCR will log 3 documents to the conflict collection - CRD (Conflict record document - contains metadata of detected true conflict), source document in conflict & target document in conflict. It should be mentioned that the CRD will contain the document IDs of source and target documents logged. Maybe an example of source and target document IDs in CRD.
  5. Continuation of (3), I think there should be some examples on how to make use of the detected and logged conflicts. Eg: Use SDK, N1QL, range scan, eventing etc.
  6. There should be a mention that the logged documents will not be replicated by XDCR if conflict collection is a source collection of any XDCR.

@sumukhbhat2701
Copy link

I think I missed one of the pages from reviewing, so if somethings are already done from last comment, please ignore.

* xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#upgrade-xdcr-setup-conflict-logging[*Upgrading an Existing Active-Passive XDCR Setup*]: Configure an existing active-passive XDCR setup into an active-active XDCR setup.
[#hlv]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this detail of a section for HLV?
cc: @hyunjuV for your thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page is mainly for conceptual information. Users may not find HLV related information anywhere else.

* In conflict with the one at target, so a merge needs to be performed. This happens when the target has mutations in the document not included in the source document. This is true conflict detection.
However, comparing documents’ CAS values is straightforward (by comparing integers), whereas comparing HLVs is complex. HLVs combine CAS with per-source version history, so clear HLV properties and rules are defined for “greater than” and “equal”.
[#compare-hlvs-to-detect-conflicts]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is way too much design / implementation detail and can be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify the lines that need to be removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole section is not needed.

* *Dynamic Rebalancing:* You can temporarily increase the resource allocation for conflict logging using a dedicated “boost” option via curl command called `ClogBoost`, to handle an increased number of conflict events.
[#conflict-logger-data-flow]
==== Data Flow and Processing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section I feel is not needed again because it goes deep into implementation.

* *Hibernation Mechanism:* If the logger cannot process requests due to persistent errors, such as misconfiguration, slow IO, resource exhaustion, logging is temporarily disabled or hibernated. This prevents replication performance from being degraded. Logging is re-enabled after a set interval or once errors are resolved.
[#shared-resources-logger]
==== Shared Resources and Connection Handling

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed


* *Token-Based Throttling:* Logging tasks receive a minimal percentage of tokens (default allocation: 89% high-priority replication, 8% low-priority, 3% for logging). If insufficient tokens are available, logging requests are throttled to avoid impacting replication performance.
* *Dynamic Rebalancing:* You can temporarily increase the resource allocation for conflict logging using a dedicated “boost” option via curl command called `ClogBoost`, to handle an increased number of conflict events.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClogBoost is an internal setting. Not sure if we want to document it. cc: @staticgc

Copy link
Contributor Author

@rao-shwe rao-shwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumukhbhat2701

I've implemented most of your review inputs and closed the comments.

* xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#upgrade-xdcr-setup-conflict-logging[*Upgrading an Existing Active-Passive XDCR Setup*]: Configure an existing active-passive XDCR setup into an active-active XDCR setup.
[#hlv]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page is mainly for conceptual information. Users may not find HLV related information anywhere else.

* In conflict with the one at target, so a merge needs to be performed. This happens when the target has mutations in the document not included in the source document. This is true conflict detection.
However, comparing documents’ CAS values is straightforward (by comparing integers), whereas comparing HLVs is complex. HLVs combine CAS with per-source version history, so clear HLV properties and rules are defined for “greater than” and “equal”.
[#compare-hlvs-to-detect-conflicts]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify the lines that need to be removed.

* *Best-Effort Logging:* Logging is attempted on a best-effort basis, reflecting the lowest operational priority compared to the main data transfer tasks.
[#resource-management-logger]
==== Resource Management
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I've mentioned Resource Management as a generic term and not as a Server component.

@rao-shwe
Copy link
Contributor Author

@sumukhbhat2701

Points 1 is fixed.
Point 2 needs to be addressed.
Point 3: Already has examples and descriptions. Not okay to repeat the same content in multiple locations. So I've added a link to examples wherever necessary.
Point 4, 5, and 6: Already exists.

Some review feedback which hold true for all pages:

  1. For this feature specifically, we need to use the term "true conflicts" more than just mentioning "conflicts". That means we need to first define what a true conflict is and set the expectation.
  2. There should be a warning that this feature is best effort (and that true conflicts is assumed to be very low). Everything that's in this slide - https://couchbase.slack.com/archives/C0963TSUU0N/p1752763776316649.
  3. The setting is quite complex to understand just from textual description. An example will do a lot of help to someone new reading this.
  4. There should be a mention that on every true conflict detected, XDCR will log 3 documents to the conflict collection - CRD (Conflict record document - contains metadata of detected true conflict), source document in conflict & target document in conflict. It should be mentioned that the CRD will contain the document IDs of source and target documents logged. Maybe an example of source and target document IDs in CRD.
  5. Continuation of (3), I think there should be some examples on how to make use of the detected and logged conflicts. Eg: Use SDK, N1QL, range scan, eventing etc. @hyunjuV I think you had a document prepared for this, was that for public docs?
  6. There should be a mention that the logged documents will not be replicated by XDCR if conflict collection is a source collection of any XDCR.

If you try to use the feature _XDCR Active-Active with Sync Gateway_ when you have more than 10 user xattrs in your document, the XDCR replication **silently skips** replicating that document.
As a result, the data in the replication-skipped document will not be consistent between the target and source clusters.
The only way you will know this skip occured is because the Prometheus stat `subdoc_cmd_docs_skipped` will be incremented and the document will _not_ be consistent between the target and source.
* Eventing Service cannot be used with Sync Gateway in bi-directional XDCR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If you are using Eventing Service functions that update documents in the XDCR replicated buckets, you must take care that the deployed Eventing functions do not cause XDCR to ping-pong and never stop replicating.

As a result, the data in the replication-skipped document will not be consistent between the target and source clusters.
The only way you will know this skip occured is because the Prometheus stat `subdoc_cmd_docs_skipped` will be incremented and the document will _not_ be consistent between the target and source.
* Eventing Service cannot be used with Sync Gateway in bi-directional XDCR.
If used with the _Sync Gateway in a bi-directional, active-active XDCR_ environment, the updates of Eventing Service metadata in the source and the target clusters causes XDCR to ping-pong and never stop replicating.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using Eventing functions that update the documents in the XDCR replicated buckets (also referred to as Eventing source bucket mutations), ensure that the deployed functions behave as desired in the replication environment. Within a bi-directional, active-active XDCR environment, the deployed Eventing functions can cause XDCR to ping-pong and never stop replicating if you do not include logic to prevent the infinite loop. In general, for active-active, avoid redundant updates with appropriate logic within the Eventing functions. See XDCR Active-Active and Eventing for more information.

Note for @rao-shwe :
Fortune Ikechi is working on DOC-13300, which will add a page called "XDCR Active-Active and Eventing" in 7.6.x documentation. One of the changes for that work is to update this note in lines 25-26.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants