From a21fe718f46e5987dd9cf1ca698dce6d0d060795 Mon Sep 17 00:00:00 2001 From: Binbin Date: Thu, 24 Oct 2024 16:38:47 +0800 Subject: [PATCH] Limit CLUSTER_CANT_FAILOVER_DATA_AGE log to 10 times period (#1189) If a replica is step into data_age too old stage, it can not trigger the failover and currently it can not be automatically recovered and we will print a log every CLUSTER_CANT_FAILOVER_RELOG_PERIOD, which is every second. If the primary has not recovered or there is no manual failover, this log will flood the log file. In this case, limit its frequency to 10 times period, which is 10 seconds in our code. Also in this data_age too old stage, the repeated logs also can stand for the progress of the failover. See also #780 for more details about it. Signed-off-by: Binbin Co-authored-by: Ping Xie --- src/cluster_legacy.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/cluster_legacy.c b/src/cluster_legacy.c index e56f1c2823..43d56b9a09 100644 --- a/src/cluster_legacy.c +++ b/src/cluster_legacy.c @@ -4433,11 +4433,18 @@ int clusterGetReplicaRank(void) { void clusterLogCantFailover(int reason) { char *msg; static time_t lastlog_time = 0; + time_t now = time(NULL); - /* Don't log if we have the same reason for some time. */ - if (reason == server.cluster->cant_failover_reason && - time(NULL) - lastlog_time < CLUSTER_CANT_FAILOVER_RELOG_PERIOD) + /* General logging suppression if the same reason has occurred recently. */ + if (reason == server.cluster->cant_failover_reason && now - lastlog_time < CLUSTER_CANT_FAILOVER_RELOG_PERIOD) { return; + } + + /* Special case: If the failure reason is due to data age, log 10 times less frequently. */ + if (reason == server.cluster->cant_failover_reason && reason == CLUSTER_CANT_FAILOVER_DATA_AGE && + now - lastlog_time < 10 * CLUSTER_CANT_FAILOVER_RELOG_PERIOD) { + return; + } server.cluster->cant_failover_reason = reason;