-
Notifications
You must be signed in to change notification settings - Fork 184
DOC-12484 XDCR Conflict Logging feature #3806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/8.0
Are you sure you want to change the base?
Conversation
|
||
. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#xdcr-conflict-detection[*Conflict Detection*]: During the replication, XDCR detects true conflicts by comparing the Hybrid Logical Vector (HLV) metadata of the source and target documents. | ||
|
||
. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#conflict-logging-process[*Conflict Logging*]: When a true conflict is detected, XDCR logs the conflict details, such as document ID, document contents, and conflicting document histories, into the designated conflict collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to define what a true conflict is?
|
||
. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#conflict-logging-process[*Conflict Logging*]: When a true conflict is detected, XDCR logs the conflict details, such as document ID, document contents, and conflicting document histories, into the designated conflict collection. | ||
|
||
. xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#conflict-access-and-management[*Conflict Access and Management*]: Administrators can access and review the logged conflicts. Then manually resolve the conflicts by selecting the appropriate mutations for replication and upsert the documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's not just Administrators. Technically it's any user who has RW access to the bucket can view and access.
@@ -28,6 +28,14 @@ curl -v -X POST -u [admin]:[password] | |||
-d fromBucket=[bucket-name] | |||
-d toCluster=[cluster-name] | |||
-d toBucket=[bucket-name] | |||
-d conflictLogging='{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be URL-encoded. Using -d followed by a string encoded JSON will throw an error.
One can use --data-urlencode like https://couchbase.slack.com/archives/C0963TSUU0N/p1752763203434269.
Or another option is to follow how colMappingRules [JSON-Document]
is mentioned below (eg: conflictLogging [JSON-Document]
) and explain the format of JSON document as explained now.
@@ -28,6 +28,14 @@ curl -v -X POST -u [admin]:[password] | |||
-d fromBucket=[bucket-name] | |||
-d toCluster=[cluster-name] | |||
-d toBucket=[bucket-name] | |||
-d conflictLogging='{ | |||
"disabled": [true | false], "bucket": [conflict-bucket-name], "collection": [conflict-scope-name].[conflict-collection-name], "loggingRules": { | |||
[custom-conflict-scope-name]: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the LHS is source collection of the replication.
[custom-conflict-scope-name]: {
-> [source-scope-name]: {
the LHS can also be [source-collection-name]
the RHS can also be {}
or null
---- | ||
|
||
The `type` value must be `xmem`; which is sometimes referred to as *Version 2*, and corresponds to the _Memcached Binary_ protocol, used in XDCR communications. | ||
|
||
The `replicationType` value is always `continuous`. | ||
This value must be specified. | ||
|
||
The `conflictLogging` flag enables or disables conflict logging for the replication. | ||
When enabled (`disabled=false`), you can specify the target bucket, scope, and collection for logging conflicts, as well as custom logging rules for specific collections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as custom logging rules for specific collections
-> source collections of the replication
---- | ||
|
||
The `type` value must be `xmem`; which is sometimes referred to as *Version 2*, and corresponds to the _Memcached Binary_ protocol, used in XDCR communications. | ||
|
||
The `replicationType` value is always `continuous`. | ||
This value must be specified. | ||
|
||
The `conflictLogging` flag enables or disables conflict logging for the replication. | ||
When enabled (`disabled=false`), you can specify the target bucket, scope, and collection for logging conflicts, as well as custom logging rules for specific collections. | ||
This helps track and resolve document conflicts during replication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention the words
- conflicts -> true conflicts
- manually if needed -
resolve document conflicts manually
| `conflictLogging` | ||
| disabled (true/false) | ||
| Configuration settings for conflict logging. This configuration setting defines objects/parameters and options used to control how conflicts are logged within the application. | ||
It includes settings such as log levels, output destinations, and thresholds for logging conflict events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but what does log levels
and thresholds
mean?
modules/learn/pages/clusters-and-availability/xdcr-enable-crossclusterversioning.adoc
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some review feedback which hold true for all pages:
- For this feature specifically, we need to use the term "true conflicts" more than just mentioning "conflicts". That means we need to first define what a true conflict is and set the expectation.
- There should be a warning that this feature is best effort (and that true conflicts is assumed to be very low). Everything that's in this slide - https://couchbase.slack.com/archives/C0963TSUU0N/p1752763776316649.
- The setting is quite complex to understand just from textual description. An example will do a lot of help to someone new reading this.
- There should be a mention that on every true conflict detected, XDCR will log 3 documents to the conflict collection - CRD (Conflict record document - contains metadata of detected true conflict), source document in conflict & target document in conflict. It should be mentioned that the CRD will contain the document IDs of source and target documents logged. Maybe an example of source and target document IDs in CRD.
- Continuation of (3), I think there should be some examples on how to make use of the detected and logged conflicts. Eg: Use SDK, N1QL, range scan, eventing etc.
- There should be a mention that the logged documents will not be replicated by XDCR if conflict collection is a source collection of any XDCR.
I think I missed one of the pages from reviewing, so if somethings are already done from last comment, please ignore. |
* xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#upgrade-xdcr-setup-conflict-logging[*Upgrading an Existing Active-Passive XDCR Setup*]: Configure an existing active-passive XDCR setup into an active-active XDCR setup. | ||
[#hlv] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this detail of a section for HLV?
cc: @hyunjuV for your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page is mainly for conceptual information. Users may not find HLV related information anywhere else.
modules/learn/pages/clusters-and-availability/xdcr-conflict-logging-feature.adoc
Outdated
Show resolved
Hide resolved
modules/learn/pages/clusters-and-availability/xdcr-conflict-logging-feature.adoc
Outdated
Show resolved
Hide resolved
modules/learn/pages/clusters-and-availability/xdcr-conflict-logging-feature.adoc
Show resolved
Hide resolved
* In conflict with the one at target, so a merge needs to be performed. This happens when the target has mutations in the document not included in the source document. This is true conflict detection. | ||
However, comparing documents’ CAS values is straightforward (by comparing integers), whereas comparing HLVs is complex. HLVs combine CAS with per-source version history, so clear HLV properties and rules are defined for “greater than” and “equal”. | ||
[#compare-hlvs-to-detect-conflicts] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is way too much design / implementation detail and can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify the lines that need to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole section is not needed.
* *Dynamic Rebalancing:* You can temporarily increase the resource allocation for conflict logging using a dedicated “boost” option via curl command called `ClogBoost`, to handle an increased number of conflict events. | ||
[#conflict-logger-data-flow] | ||
==== Data Flow and Processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section I feel is not needed again because it goes deep into implementation.
* *Hibernation Mechanism:* If the logger cannot process requests due to persistent errors, such as misconfiguration, slow IO, resource exhaustion, logging is temporarily disabled or hibernated. This prevents replication performance from being degraded. Logging is re-enabled after a set interval or once errors are resolved. | ||
[#shared-resources-logger] | ||
==== Shared Resources and Connection Handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
modules/learn/pages/clusters-and-availability/xdcr-conflict-logging-feature.adoc
Outdated
Show resolved
Hide resolved
modules/learn/pages/clusters-and-availability/xdcr-conflict-logging-feature.adoc
Outdated
Show resolved
Hide resolved
|
||
* *Token-Based Throttling:* Logging tasks receive a minimal percentage of tokens (default allocation: 89% high-priority replication, 8% low-priority, 3% for logging). If insufficient tokens are available, logging requests are throttled to avoid impacting replication performance. | ||
* *Dynamic Rebalancing:* You can temporarily increase the resource allocation for conflict logging using a dedicated “boost” option via curl command called `ClogBoost`, to handle an increased number of conflict events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClogBoost is an internal setting. Not sure if we want to document it. cc: @staticgc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented most of your review inputs and closed the comments.
* xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#upgrade-xdcr-setup-conflict-logging[*Upgrading an Existing Active-Passive XDCR Setup*]: Configure an existing active-passive XDCR setup into an active-active XDCR setup. | ||
[#hlv] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page is mainly for conceptual information. Users may not find HLV related information anywhere else.
modules/learn/pages/clusters-and-availability/xdcr-conflict-logging-feature.adoc
Show resolved
Hide resolved
* In conflict with the one at target, so a merge needs to be performed. This happens when the target has mutations in the document not included in the source document. This is true conflict detection. | ||
However, comparing documents’ CAS values is straightforward (by comparing integers), whereas comparing HLVs is complex. HLVs combine CAS with per-source version history, so clear HLV properties and rules are defined for “greater than” and “equal”. | ||
[#compare-hlvs-to-detect-conflicts] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify the lines that need to be removed.
* *Best-Effort Logging:* Logging is attempted on a best-effort basis, reflecting the lowest operational priority compared to the main data transfer tasks. | ||
[#resource-management-logger] | ||
==== Resource Management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I've mentioned Resource Management as a generic term and not as a Server component.
modules/learn/pages/clusters-and-availability/xdcr-enable-crossclusterversioning.adoc
Show resolved
Hide resolved
Points 1 is fixed.
|
If you try to use the feature _XDCR Active-Active with Sync Gateway_ when you have more than 10 user xattrs in your document, the XDCR replication **silently skips** replicating that document. | ||
As a result, the data in the replication-skipped document will not be consistent between the target and source clusters. | ||
The only way you will know this skip occured is because the Prometheus stat `subdoc_cmd_docs_skipped` will be incremented and the document will _not_ be consistent between the target and source. | ||
* Eventing Service cannot be used with Sync Gateway in bi-directional XDCR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If you are using Eventing Service functions that update documents in the XDCR replicated buckets, you must take care that the deployed Eventing functions do not cause XDCR to ping-pong and never stop replicating.
As a result, the data in the replication-skipped document will not be consistent between the target and source clusters. | ||
The only way you will know this skip occured is because the Prometheus stat `subdoc_cmd_docs_skipped` will be incremented and the document will _not_ be consistent between the target and source. | ||
* Eventing Service cannot be used with Sync Gateway in bi-directional XDCR. | ||
If used with the _Sync Gateway in a bi-directional, active-active XDCR_ environment, the updates of Eventing Service metadata in the source and the target clusters causes XDCR to ping-pong and never stop replicating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are using Eventing functions that update the documents in the XDCR replicated buckets (also referred to as Eventing source bucket mutations), ensure that the deployed functions behave as desired in the replication environment. Within a bi-directional, active-active XDCR environment, the deployed Eventing functions can cause XDCR to ping-pong and never stop replicating if you do not include logic to prevent the infinite loop. In general, for active-active, avoid redundant updates with appropriate logic within the Eventing functions. See XDCR Active-Active and Eventing for more information.
Note for @rao-shwe :
Fortune Ikechi is working on DOC-13300, which will add a page called "XDCR Active-Active and Eventing" in 7.6.x documentation. One of the changes for that work is to update this note in lines 25-26.
DOC-12484
Link to the preview doc: https://preview.docs-test.couchbase.com/DOC-12484/server/current/learn/clusters-and-availability/xdcr-conflict-logging-feature.html
Preview pages:
PR pages:
New page: XDCR Conflict Logging.
Updated the following pages for "XDCR Conflict Logging":
Don't review the following files: The following are 7.6.6 release docs which were missing in the release/8.0 branch.