Skip to content

Files

Latest commit

Dec 15, 2023
f400df2 · Dec 15, 2023

History

History
212 lines (173 loc) · 9.44 KB

20210108_02.md

File metadata and controls

212 lines (173 loc) · 9.44 KB

PostgreSQL 14 preview - log_recovery_conflict_waits - standby query&startup process conflict 恢复冲突 超时(timeout)配置,日志打印

作者

digoal

日期

2021-01-08

标签

PostgreSQL , 冲突 , standby , deadlock_timeout


背景

standby需要replay wal日志, 回放和用户的查询可能存在冲突, 例如查询快照比较老, 但是正在回放的wal可能要做vacuum操作清理老快照可能要查询的数据, 更多详情参考如下:

《PostgreSQL 流复制冲突分类讲解以及对应解决方案 - DEALING WITH STREAMING REPLICATION CONFLICTS IN POSTGRESQL - 特别是lock confict(vacuum truncate suffix free page引起的)》

《PostgreSQL standby conflict replay分析和解决方案》

《PostgreSQL源码分析 备库查询冲突 - User was holding shared buffer pin for too long》

《PostgreSQL 物理流复制从库 - 冲突判定, 谁堵塞了wal replay, 等了多久》

PostgreSQL 14可以支持冲突日志打印了, 超过deadlock_time配置的冲突等待都会被打印到日志中(log_recovery_conflict_waits=on 时), 包括冲突对应的锁类型, 什么backend pid导致的等等. 方便我们观察冲突发生的历史事件, 分析问题.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0650ff23038bc3eb8d8fd851744db837d921e285

Add GUC to log long wait times on recovery conflicts.  
author	Fujii Masao <[email protected]>	  
Thu, 7 Jan 2021 15:47:03 +0000 (00:47 +0900)  
committer	Fujii Masao <[email protected]>	  
Thu, 7 Jan 2021 15:47:03 +0000 (00:47 +0900)  
commit	0650ff23038bc3eb8d8fd851744db837d921e285  
tree	1df2b5ccd7a20ee538c93339b115ca4f274a5aa6	tree | snapshot  
parent	f7a1a805cb178653ea2a6c8991ad73b035af953e	commit | diff  
Add GUC to log long wait times on recovery conflicts.  
  
This commit adds GUC log_recovery_conflict_waits that controls whether  
a log message is produced when the startup process is waiting longer than  
deadlock_timeout for recovery conflicts. This is useful in determining  
if recovery conflicts prevent the recovery from applying WAL.  
  
Note that currently a log message is produced only when recovery conflict  
has not been resolved yet even after deadlock_timeout passes, i.e.,  
only when the startup process is still waiting for recovery conflict  
even after deadlock_timeout.  
  
Author: Bertrand Drouvot, Masahiko Sawada  
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao  
Discussion: https://postgr.es/m/[email protected]  
+     <varlistentry id="guc-log-recovery-conflict-waits" xreflabel="log_recovery_conflict_waits">  
+      <term><varname>log_recovery_conflict_waits</varname> (<type>boolean</type>)  
+      <indexterm>  
+       <primary><varname>log_recovery_conflict_waits</varname> configuration parameter</primary>  
+      </indexterm>  
+      </term>  
+      <listitem>  
+       <para>  
+        Controls whether a log message is produced when the startup process  
+        is waiting longer than <varname>deadlock_timeout</varname>  
+        for recovery conflicts.  This is useful in determining if recovery  
+        conflicts prevent the recovery from applying WAL.  
+       </para>  
+  
+       <para>  
+        The default is <literal>off</literal>.  This parameter can only be set  
+        in the <filename>postgresql.conf</filename> file or on the server  
+        command line.  
+       </para>  
+      </listitem>  
+     </varlistentry>  
+/*  
+ * Log the recovery conflict.  
+ *  
+ * wait_start is the timestamp when the caller started to wait.  
+ * now is the timestamp when this function has been called.  
+ * wait_list is the list of virtual transaction ids assigned to  
+ * conflicting processes.  
+ */  
+void  
+LogRecoveryConflict(ProcSignalReason reason, TimestampTz wait_start,  
+                                       TimestampTz now, VirtualTransactionId *wait_list)  
+{  
+       long            secs;  
+       int                     usecs;  
+       long            msecs;  
+       StringInfoData buf;  
+       int                     nprocs = 0;  
+  
+       TimestampDifference(wait_start, now, &secs, &usecs);  
+       msecs = secs * 1000 + usecs / 1000;  
+       usecs = usecs % 1000;  
+  
+       if (wait_list)  
+       {  
+               VirtualTransactionId *vxids;  
+  
+               /* Construct a string of list of the conflicting processes */  
+               vxids = wait_list;  
+               while (VirtualTransactionIdIsValid(*vxids))  
+               {  
+                       PGPROC     *proc = BackendIdGetProc(vxids->backendId);  
+  
+                       /* proc can be NULL if the target backend is not active */  
+                       if (proc)  
+                       {  
+                               if (nprocs == 0)  
+                               {  
+                                       initStringInfo(&buf);  
+                                       appendStringInfo(&buf, "%d", proc->pid);  
+                               }  
+                               else  
+                                       appendStringInfo(&buf, ", %d", proc->pid);  
+  
+                               nprocs++;  
+                       }  
+  
+                       vxids++;  
+               }  
+       }  
+  
+       /*  
+        * If wait_list is specified, report the list of PIDs of active  
+        * conflicting backends in a detail message. Note that if all the backends  
+        * in the list are not active, no detail message is logged.  
+        */  
+       ereport(LOG,  
+                       errmsg("recovery still waiting after %ld.%03d ms: %s",  
+                                  msecs, usecs, _(get_recovery_conflict_desc(reason))),  
+                       nprocs > 0 ? errdetail_log_plural("Conflicting process: %s.",  
+                                                                                         "Conflicting processes: %s.",  
+                                                                                         nprocs, buf.data) : 0);  
+  
+       if (nprocs > 0)  
+               pfree(buf.data);  
+}  
+/* Return the description of recovery conflict */  
+static const char *  
+get_recovery_conflict_desc(ProcSignalReason reason)  
+{  
+       const char *reasonDesc = gettext_noop("unknown reason");  
+  
+       switch (reason)  
+       {  
+               case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:  
+                       reasonDesc = gettext_noop("recovery conflict on buffer pin");  
+                       break;  
+               case PROCSIG_RECOVERY_CONFLICT_LOCK:  
+                       reasonDesc = gettext_noop("recovery conflict on lock");  
+                       break;  
+               case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:  
+                       reasonDesc = gettext_noop("recovery conflict on tablespace");  
+                       break;  
+               case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:  
+                       reasonDesc = gettext_noop("recovery conflict on snapshot");  
+                       break;  
+               case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:  
+                       reasonDesc = gettext_noop("recovery conflict on buffer deadlock");  
+                       break;  
+               case PROCSIG_RECOVERY_CONFLICT_DATABASE:  
+                       reasonDesc = gettext_noop("recovery conflict on database");  
+                       break;  
+               default:  
+                       break;  
+       }  
+  
+       return reasonDesc;  
+}  

冲突结束后,如果整个冲突时间超过deadlock_timeout,也会被记录下来,用于判断总共等待了多长时间。

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=39b03690b529935a3c33024ee68f08e2d347cf4f

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat