Improve MongoDB ReplicationLag alert #2205

williamlardier · 2025-03-20T11:34:52Z

ReplicationLagWarning alert to become dynamic:

The threshold is not 30s anymore. This fixed value has two problems:
- it is not consistent with the flowControlTargetLagSeconds config set as 10s.
- it is creating a lot of alerts when a 3 node cluster is under load, when the lag may be in the 1min-30min range without impact on the replication.
The new alert instead considers the largest oplog window of the PRIMARY mongod instance to dynamically create alerts. The "largest" is here to be future proof with multiple shards.
Warning when the optimeDate (current lag) of any secondary exceeds 70% of the primary oplog window for 8min.
Critical when it exceeds 95% for 2min. In this case there is a sizing issue and a high risk of requiring full init syncs.

bert-e · 2025-03-20T11:34:55Z

Hello williamlardier,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

bert-e · 2025-03-20T11:36:35Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

monitoring/mongodb/alerts.yaml

- The threshold is not 30s anymore. This fixed value has two problems: - it is not consistent with the flowControlTargetLagSeconds config set as 10s. - it is creating a lot of alert when a 3 node cluster is under load, when the lag may be in the 1min-30min range without impact on the replication. - The new alert considers the oplog window of the mongod instancesto dynamically create alerts. - Warning when we exceed 70% of the oplog. - Critical when we exceed 95%. In this case there is a sizing issue and a high risk of requiring full init syncs. Issue: ZENKO-4986

Issue: ZENKO-4986

francoisferrand · 2025-03-21T10:55:08Z

monitoring/mongodb/alerts.yaml

+            max by(pod, rs_nm) (
+              mongodb_rs_members_optimeDate{namespace="${namespace}",pod=~"${service}.*",member_state="PRIMARY"}
+            )
+            - ignoring(member_idx) group_right min by(pod, rs_nm, member_idx) (
+              mongodb_rs_members_optimeDate{namespace="${namespace}",pod=~"${service}.*",member_state="SECONDARY"}
+            )


overall, this computation is done on each pod : we compare the primary's oplog date vs all the secondary oplog date, as viewed by this pod.
→ this gives the max optime delta as viewed by this pod.

should it not be other way, i.e. for each member_idx, diff the max(PRIMARY) with the max(SECONDARY) [ignoring pods], which gives a simpler expression (no ignoring, no need to perfrom 2 nested max) and more precisely mesures the lag of that member_idx vs the primary ?

I'll check that approach yes (I'm also trying that on multi shard clusters to ensure it's future proof)

(from what I tested, we hit issues because labels are not unique the other way around... giving errors)

francoisferrand · 2025-03-21T10:57:26Z

monitoring/mongodb/alerts.yaml

+            max (
+              (mongodb_mongod_replset_oplog_head_timestamp{namespace="${namespace}",pod=~"${service}.*"}
+              - on(pod)
+              mongodb_mongod_replset_oplog_tail_timestamp{namespace="${namespace}",pod=~"${service}.*"})
+              * on(pod) group_left()
+              (mongodb_mongod_replset_my_state{namespace="${namespace}",pod=~"${service}.*"} == 1)
+            )


can we/should we only look at the oplog headroom from the PRIMARY ?
(i.e. if one SECONDARY has lots of headroom, it would affect the metrics even though it does not minimize the risk)

should we not take the min headroom (instead of max)?

This part is the journal window. The headroom would typically be something like

(avg(mongodb_mongod_replset_oplog_tail_timestamp - mongodb_mongod_replset_oplog_head_timestamp) - (avg(mongodb_mongod_replset_member_optime_date{state="PRIMARY"}) - avg(mongodb_mongod_replset_member_optime_date{state="SECONDARY"})))

Here, the whole alert is similar to the headroom, but computed as a percentage so we can define thresholds and not just alert when it's too late.

As the replication happens between the primary and secondaries only, not between secondaries, I don't think we need to consider secondary oplog window: its size has nothing to do with the secondary ability to converge its replication.

should we not take the min headroom (instead of max)?

I asked myself this question and with only one replicaset we only have one value anyway. But I chose max to be on the optimistic side and avoid alerting too early. min makes sense if we want to detect the problem as early as possible, I'll go with it; I am aligned.

This part is the journal window. The headroom would typically be something like

ok, journal window :-)
But my point was really about looking only at the primary, since this is the "important" oplog in this case.

Yes, this is what we do already, by filtering on the "my state" being 1 (=== PRIMARY), did you have something else in mind?

williamlardier changed the title ~~Improvement/zenko 4986~~ Improve MongoDB ReplicationLag alert Mar 20, 2025

williamlardier changed the base branch from development/2.11 to development/2.10 March 20, 2025 11:35

scality deleted a comment from bert-e Mar 20, 2025

francoisferrand reviewed Mar 21, 2025

View reviewed changes

monitoring/mongodb/alerts.yaml Outdated Show resolved Hide resolved

francoisferrand reviewed Mar 21, 2025

View reviewed changes

monitoring/mongodb/alerts.yaml Outdated Show resolved Hide resolved

francoisferrand reviewed Mar 21, 2025

View reviewed changes

monitoring/mongodb/alerts.yaml Outdated Show resolved Hide resolved

francoisferrand reviewed Mar 21, 2025

View reviewed changes

monitoring/mongodb/alerts.yaml Outdated Show resolved Hide resolved

williamlardier force-pushed the improvement/ZENKO-4986 branch from f082f1b to 0554827 Compare March 21, 2025 10:20

williamlardier added 2 commits March 21, 2025 11:22

Add gitignore for the alerts rendered to ease local testing

aee13f6

Issue: ZENKO-4986

williamlardier force-pushed the improvement/ZENKO-4986 branch from 0554827 to aee13f6 Compare March 21, 2025 10:22

williamlardier requested review from francoisferrand, Kerkesni and benzekrimaha March 21, 2025 10:23

francoisferrand reviewed Mar 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve MongoDB ReplicationLag alert #2205

Improve MongoDB ReplicationLag alert #2205

williamlardier commented Mar 20, 2025 •

edited by jira bot

Loading

bert-e commented Mar 20, 2025

bert-e commented Mar 20, 2025

francoisferrand Mar 21, 2025 •

edited

Loading

williamlardier Mar 21, 2025 •

edited

Loading

francoisferrand Mar 21, 2025 •

edited

Loading

williamlardier Mar 21, 2025

francoisferrand Mar 21, 2025

williamlardier Mar 21, 2025

Improve MongoDB ReplicationLag alert #2205

Are you sure you want to change the base?

Improve MongoDB ReplicationLag alert #2205

Conversation

williamlardier commented Mar 20, 2025 • edited by jira bot Loading

bert-e commented Mar 20, 2025

Hello williamlardier,

bert-e commented Mar 20, 2025

Request integration branches

francoisferrand Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

williamlardier Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

francoisferrand Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

williamlardier Mar 21, 2025

Choose a reason for hiding this comment

francoisferrand Mar 21, 2025

Choose a reason for hiding this comment

williamlardier Mar 21, 2025

Choose a reason for hiding this comment

williamlardier commented Mar 20, 2025 •

edited by jira bot

Loading

francoisferrand Mar 21, 2025 •

edited

Loading

williamlardier Mar 21, 2025 •

edited

Loading

francoisferrand Mar 21, 2025 •

edited

Loading