PostgreSQL 14 preview - log_recovery_conflict_waits - standby query&startup process conflict 恢复冲突 超时(timeout)配置,日志打印
digoal
2021-01-08
PostgreSQL , 冲突 , standby , deadlock_timeout
standby需要replay wal日志, 回放和用户的查询可能存在冲突, 例如查询快照比较老, 但是正在回放的wal可能要做vacuum操作清理老快照可能要查询的数据, 更多详情参考如下:
《PostgreSQL standby conflict replay分析和解决方案》
《PostgreSQL源码分析 备库查询冲突 - User was holding shared buffer pin for too long》
《PostgreSQL 物理流复制从库 - 冲突判定, 谁堵塞了wal replay, 等了多久》
PostgreSQL 14可以支持冲突日志打印了, 超过deadlock_time配置的冲突等待都会被打印到日志中(log_recovery_conflict_waits=on 时), 包括冲突对应的锁类型, 什么backend pid导致的等等. 方便我们观察冲突发生的历史事件, 分析问题.
Add GUC to log long wait times on recovery conflicts.
author Fujii Masao <[email protected]>
Thu, 7 Jan 2021 15:47:03 +0000 (00:47 +0900)
committer Fujii Masao <[email protected]>
Thu, 7 Jan 2021 15:47:03 +0000 (00:47 +0900)
commit 0650ff23038bc3eb8d8fd851744db837d921e285
tree 1df2b5ccd7a20ee538c93339b115ca4f274a5aa6 tree | snapshot
parent f7a1a805cb178653ea2a6c8991ad73b035af953e commit | diff
Add GUC to log long wait times on recovery conflicts.
This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.
Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.
Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/[email protected]
+ <varlistentry id="guc-log-recovery-conflict-waits" xreflabel="log_recovery_conflict_waits">
+ <term><varname>log_recovery_conflict_waits</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>log_recovery_conflict_waits</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Controls whether a log message is produced when the startup process
+ is waiting longer than <varname>deadlock_timeout</varname>
+ for recovery conflicts. This is useful in determining if recovery
+ conflicts prevent the recovery from applying WAL.
+ </para>
+
+ <para>
+ The default is <literal>off</literal>. This parameter can only be set
+ in the <filename>postgresql.conf</filename> file or on the server
+ command line.
+ </para>
+ </listitem>
+ </varlistentry>
+/*
+ * Log the recovery conflict.
+ *
+ * wait_start is the timestamp when the caller started to wait.
+ * now is the timestamp when this function has been called.
+ * wait_list is the list of virtual transaction ids assigned to
+ * conflicting processes.
+ */
+void
+LogRecoveryConflict(ProcSignalReason reason, TimestampTz wait_start,
+ TimestampTz now, VirtualTransactionId *wait_list)
+{
+ long secs;
+ int usecs;
+ long msecs;
+ StringInfoData buf;
+ int nprocs = 0;
+
+ TimestampDifference(wait_start, now, &secs, &usecs);
+ msecs = secs * 1000 + usecs / 1000;
+ usecs = usecs % 1000;
+
+ if (wait_list)
+ {
+ VirtualTransactionId *vxids;
+
+ /* Construct a string of list of the conflicting processes */
+ vxids = wait_list;
+ while (VirtualTransactionIdIsValid(*vxids))
+ {
+ PGPROC *proc = BackendIdGetProc(vxids->backendId);
+
+ /* proc can be NULL if the target backend is not active */
+ if (proc)
+ {
+ if (nprocs == 0)
+ {
+ initStringInfo(&buf);
+ appendStringInfo(&buf, "%d", proc->pid);
+ }
+ else
+ appendStringInfo(&buf, ", %d", proc->pid);
+
+ nprocs++;
+ }
+
+ vxids++;
+ }
+ }
+
+ /*
+ * If wait_list is specified, report the list of PIDs of active
+ * conflicting backends in a detail message. Note that if all the backends
+ * in the list are not active, no detail message is logged.
+ */
+ ereport(LOG,
+ errmsg("recovery still waiting after %ld.%03d ms: %s",
+ msecs, usecs, _(get_recovery_conflict_desc(reason))),
+ nprocs > 0 ? errdetail_log_plural("Conflicting process: %s.",
+ "Conflicting processes: %s.",
+ nprocs, buf.data) : 0);
+
+ if (nprocs > 0)
+ pfree(buf.data);
+}
+/* Return the description of recovery conflict */
+static const char *
+get_recovery_conflict_desc(ProcSignalReason reason)
+{
+ const char *reasonDesc = gettext_noop("unknown reason");
+
+ switch (reason)
+ {
+ case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+ reasonDesc = gettext_noop("recovery conflict on buffer pin");
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_LOCK:
+ reasonDesc = gettext_noop("recovery conflict on lock");
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+ reasonDesc = gettext_noop("recovery conflict on tablespace");
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+ reasonDesc = gettext_noop("recovery conflict on snapshot");
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+ reasonDesc = gettext_noop("recovery conflict on buffer deadlock");
+ break;
+ case PROCSIG_RECOVERY_CONFLICT_DATABASE:
+ reasonDesc = gettext_noop("recovery conflict on database");
+ break;
+ default:
+ break;
+ }
+
+ return reasonDesc;
+}
冲突结束后,如果整个冲突时间超过deadlock_timeout,也会被记录下来,用于判断总共等待了多长时间。
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.