{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":343429164,"defaultBranch":"REL_16_STABLE_neon","name":"postgres","ownerLogin":"neondatabase","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2021-03-01T13:38:54.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/77690634?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1719832154.0","currentOid":""},"activityList":{"items":[{"before":"9f6daf006c73187739903c7b0feff314366394c0","after":"b39f316137fdd29e2da15d2af2fdd1cfd18163be","ref":"refs/heads/get_relfilenum_fix_v16","pushedAt":"2024-07-03T14:13:00.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Remove unused local variable","shortMessageHtmlLink":"Remove unused local variable"}},{"before":"9e6b256fd855d27d2d5929a3f2d8bb08769a0d45","after":"035b73a9c5998f9a0ef35cc8df1bae680bf770fc","ref":"refs/heads/get_relfilenum_fix_v15","pushedAt":"2024-07-03T14:12:35.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Remove unused local variable","shortMessageHtmlLink":"Remove unused local variable"}},{"before":"18eb1a93c4373dafe47d5356b372a624dcac9472","after":"dbd0e6428b9274d72a10ac29bd3e3162faf109d4","ref":"refs/heads/get_relfilenum_fix_v14","pushedAt":"2024-07-03T14:12:11.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Remove unused local variable","shortMessageHtmlLink":"Remove unused local variable"}},{"before":"ad73770c446ea361f43e4f0404798b7e5e7a62d8","after":null,"ref":"refs/heads/restore_running_xids_from_clog-take-two-v14","pushedAt":"2024-07-01T11:09:14.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"}},{"before":"4874c8e52ed349a9f8290bbdcd91eb92677a5d24","after":null,"ref":"refs/heads/restore_running_xids_from_clog-take-two-v15","pushedAt":"2024-07-01T11:09:14.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"}},{"before":"b810fdfcbb59afea7ea7bbe0cf94eaccb55a2ea2","after":null,"ref":"refs/heads/restore_running_xids_from_clog-take-two-v16","pushedAt":"2024-07-01T11:09:14.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"}},{"before":"f54d7373eb0de5a54bce2becdb1c801026c7edff","after":"4874c8e52ed349a9f8290bbdcd91eb92677a5d24","ref":"refs/heads/REL_15_STABLE_neon","pushedAt":"2024-07-01T11:09:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"223dd925959f8124711dd3d867dc8ba6629d52c0","after":"ad73770c446ea361f43e4f0404798b7e5e7a62d8","ref":"refs/heads/REL_14_STABLE_neon","pushedAt":"2024-07-01T11:09:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"e06bebc75306b583e758b52c95946d41109239b2","after":"b810fdfcbb59afea7ea7bbe0cf94eaccb55a2ea2","ref":"refs/heads/REL_16_STABLE_neon","pushedAt":"2024-07-01T11:09:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"0640e6b24a7eb2b626a7432c0e2801840954429e","after":"71aa66d721b26eebb223f8989a8c2a71c498cb87","ref":"refs/heads/pgstat_aux_file_v15","pushedAt":"2024-06-28T17:22:08.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Write pgstat file only on shutdown","shortMessageHtmlLink":"Write pgstat file only on shutdown"}},{"before":"9901875cd3a4b80709653998544215323e13ef2f","after":"d7bad8236b4283831a4122afbafe832a753b0541","ref":"refs/heads/pgstat_aux_file_v16","pushedAt":"2024-06-28T17:21:20.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Write pgstat file only on shutdown","shortMessageHtmlLink":"Write pgstat file only on shutdown"}},{"before":"4edd791e777af8b04e60b0b20aa58d4c83bb49a0","after":"b810fdfcbb59afea7ea7bbe0cf94eaccb55a2ea2","ref":"refs/heads/restore_running_xids_from_clog-take-two-v16","pushedAt":"2024-06-28T14:00:13.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"c704e1b70f35a158ae47b94c4d979ccc6f3fc165","after":"ad73770c446ea361f43e4f0404798b7e5e7a62d8","ref":"refs/heads/restore_running_xids_from_clog-take-two-v14","pushedAt":"2024-06-28T13:59:44.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"42ee50b051e1b8aed6bdcd047ff34d40b5cb1565","after":"4874c8e52ed349a9f8290bbdcd91eb92677a5d24","ref":"refs/heads/restore_running_xids_from_clog-take-two-v15","pushedAt":"2024-06-28T13:59:22.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"63ee385be85691704d2568aecaeae374138e2809","after":"4edd791e777af8b04e60b0b20aa58d4c83bb49a0","ref":"refs/heads/restore_running_xids_from_clog-take-two-v16","pushedAt":"2024-06-28T13:58:20.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"223dd925959f8124711dd3d867dc8ba6629d52c0","after":null,"ref":"refs/heads/cherry-pick-upstream-TruncateMultiXact-fix-v14","pushedAt":"2024-06-28T13:46:58.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"}},{"before":"e06bebc75306b583e758b52c95946d41109239b2","after":null,"ref":"refs/heads/cherry-pick-upstream-TruncateMultiXact-fix-v16","pushedAt":"2024-06-28T13:46:58.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"}},{"before":"f54d7373eb0de5a54bce2becdb1c801026c7edff","after":null,"ref":"refs/heads/cherry-pick-upstream-TruncateMultiXact-fix-v15","pushedAt":"2024-06-28T13:46:58.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"}},{"before":"7845c122d51d3ebb547a984a640ac0310a2fadce","after":"223dd925959f8124711dd3d867dc8ba6629d52c0","ref":"refs/heads/REL_14_STABLE_neon","pushedAt":"2024-06-28T13:46:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Fix bugs in MultiXact truncation\n\n1. TruncateMultiXact() performs the SLRU truncations in a critical\nsection. Deleting the SLRU segments calls ForwardSyncRequest(), which\nwill try to compact the request queue if it's full\n(CompactCheckpointerRequestQueue()). That in turn allocates memory,\nwhich is not allowed in a critical section. Backtrace:\n\n TRAP: failed Assert(\"CritSectionCount == 0 || (context)->allowInCritSection\"), File: \"../src/backend/utils/mmgr/mcxt.c\", Line: 1353, PID: 920981\n postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]\n postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]\n postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]\n postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]\n postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]\n postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]\n postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]\n postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]\n postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]\n postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]\n postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]\n postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]\n postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]\n postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]\n /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]\n /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]\n postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]\n\nTo fix, bail out in CompactCheckpointerRequestQueue() without doing\nanything, if it's called in a critical section. That covers the above\ncall path, as well as any other similar cases where\nRegisterSyncRequest might be called in a critical section.\n\n2. After fixing that, another problem became apparent: Autovacuum\nprocess doing that truncation can deadlock with the checkpointer\nprocess. TruncateMultiXact() sets \"MyProc->delayChkptFlags |=\nDELAY_CHKPT_START\". If the sync request queue is full and cannot be\ncompacted, the process will repeatedly sleep and retry, until there is\nroom in the queue. However, if the checkpointer is trying to start a\ncheckpoint at the same time, and is waiting for the DELAY_CHKPT_START\nprocesses to finish, the queue will never shrink.\n\nMore concretely, the autovacuum process is stuck here:\n\n #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=) at ../src/backend/storage/ipc/latch.c:1570\n #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,\n wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516\n #3 0x000056220b243224 in WaitLatch (latch=, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)\n at ../src/backend/storage/ipc/latch.c:538\n #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614\n #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495\n #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566\n #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=, newOldestOffset=) at ../src/backend/access/transam/multixact.c:3006\n #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201\n #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917\n #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760\n #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550\n #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/autovacuum.c:1569\n\nand the checkpointer is stuck here:\n\n #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50\n #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098\n #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/checkpointer.c:464\n\nTo fix, add AbsorbSyncRequests() to the loops where the checkpointer\nwaits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to\nfinish.\n\nBackpatch to v14. Before that, SLRU deletion didn't call\nRegisterSyncRequest, which avoided this failure. I'm not sure if there\nare other similar scenarios on older versions, but we haven't had\nany such reports.\n\nDiscussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi\n\nNEON: Cherry-picked from the upstream thread ahead of time, to make\nthe new test being added in PR #6528 pass.","shortMessageHtmlLink":"Fix bugs in MultiXact truncation"}},{"before":"d55e0aca104af0b611cf5565f1033b2acd2dcc1c","after":"e06bebc75306b583e758b52c95946d41109239b2","ref":"refs/heads/REL_16_STABLE_neon","pushedAt":"2024-06-28T13:46:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Fix bugs in MultiXact truncation\n\n1. TruncateMultiXact() performs the SLRU truncations in a critical\nsection. Deleting the SLRU segments calls ForwardSyncRequest(), which\nwill try to compact the request queue if it's full\n(CompactCheckpointerRequestQueue()). That in turn allocates memory,\nwhich is not allowed in a critical section. Backtrace:\n\n TRAP: failed Assert(\"CritSectionCount == 0 || (context)->allowInCritSection\"), File: \"../src/backend/utils/mmgr/mcxt.c\", Line: 1353, PID: 920981\n postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]\n postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]\n postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]\n postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]\n postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]\n postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]\n postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]\n postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]\n postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]\n postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]\n postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]\n postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]\n postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]\n postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]\n /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]\n /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]\n postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]\n\nTo fix, bail out in CompactCheckpointerRequestQueue() without doing\nanything, if it's called in a critical section. That covers the above\ncall path, as well as any other similar cases where\nRegisterSyncRequest might be called in a critical section.\n\n2. After fixing that, another problem became apparent: Autovacuum\nprocess doing that truncation can deadlock with the checkpointer\nprocess. TruncateMultiXact() sets \"MyProc->delayChkptFlags |=\nDELAY_CHKPT_START\". If the sync request queue is full and cannot be\ncompacted, the process will repeatedly sleep and retry, until there is\nroom in the queue. However, if the checkpointer is trying to start a\ncheckpoint at the same time, and is waiting for the DELAY_CHKPT_START\nprocesses to finish, the queue will never shrink.\n\nMore concretely, the autovacuum process is stuck here:\n\n #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=) at ../src/backend/storage/ipc/latch.c:1570\n #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,\n wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516\n #3 0x000056220b243224 in WaitLatch (latch=, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)\n at ../src/backend/storage/ipc/latch.c:538\n #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614\n #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495\n #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566\n #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=, newOldestOffset=) at ../src/backend/access/transam/multixact.c:3006\n #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201\n #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917\n #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760\n #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550\n #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/autovacuum.c:1569\n\nand the checkpointer is stuck here:\n\n #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50\n #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098\n #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/checkpointer.c:464\n\nTo fix, add AbsorbSyncRequests() to the loops where the checkpointer\nwaits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to\nfinish.\n\nBackpatch to v14. Before that, SLRU deletion didn't call\nRegisterSyncRequest, which avoided this failure. I'm not sure if there\nare other similar scenarios on older versions, but we haven't had\nany such reports.\n\nDiscussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi\n\nNEON: Cherry-picked from the upstream thread ahead of time, to make\nthe new test being added in PR #6528 pass.","shortMessageHtmlLink":"Fix bugs in MultiXact truncation"}},{"before":"2ff5ecc67c64e5fe44b7dde598e64e4538e0c373","after":"f54d7373eb0de5a54bce2becdb1c801026c7edff","ref":"refs/heads/REL_15_STABLE_neon","pushedAt":"2024-06-28T13:46:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Fix bugs in MultiXact truncation\n\n1. TruncateMultiXact() performs the SLRU truncations in a critical\nsection. Deleting the SLRU segments calls ForwardSyncRequest(), which\nwill try to compact the request queue if it's full\n(CompactCheckpointerRequestQueue()). That in turn allocates memory,\nwhich is not allowed in a critical section. Backtrace:\n\n TRAP: failed Assert(\"CritSectionCount == 0 || (context)->allowInCritSection\"), File: \"../src/backend/utils/mmgr/mcxt.c\", Line: 1353, PID: 920981\n postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]\n postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]\n postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]\n postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]\n postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]\n postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]\n postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]\n postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]\n postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]\n postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]\n postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]\n postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]\n postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]\n postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]\n /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]\n /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]\n postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]\n\nTo fix, bail out in CompactCheckpointerRequestQueue() without doing\nanything, if it's called in a critical section. That covers the above\ncall path, as well as any other similar cases where\nRegisterSyncRequest might be called in a critical section.\n\n2. After fixing that, another problem became apparent: Autovacuum\nprocess doing that truncation can deadlock with the checkpointer\nprocess. TruncateMultiXact() sets \"MyProc->delayChkptFlags |=\nDELAY_CHKPT_START\". If the sync request queue is full and cannot be\ncompacted, the process will repeatedly sleep and retry, until there is\nroom in the queue. However, if the checkpointer is trying to start a\ncheckpoint at the same time, and is waiting for the DELAY_CHKPT_START\nprocesses to finish, the queue will never shrink.\n\nMore concretely, the autovacuum process is stuck here:\n\n #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=) at ../src/backend/storage/ipc/latch.c:1570\n #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,\n wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516\n #3 0x000056220b243224 in WaitLatch (latch=, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)\n at ../src/backend/storage/ipc/latch.c:538\n #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614\n #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495\n #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566\n #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=, newOldestOffset=) at ../src/backend/access/transam/multixact.c:3006\n #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201\n #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917\n #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760\n #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550\n #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/autovacuum.c:1569\n\nand the checkpointer is stuck here:\n\n #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50\n #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098\n #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/checkpointer.c:464\n\nTo fix, add AbsorbSyncRequests() to the loops where the checkpointer\nwaits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to\nfinish.\n\nBackpatch to v14. Before that, SLRU deletion didn't call\nRegisterSyncRequest, which avoided this failure. I'm not sure if there\nare other similar scenarios on older versions, but we haven't had\nany such reports.\n\nDiscussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi\n\nNEON: Cherry-picked from the upstream thread ahead of time, to make\nthe new test being added in PR #6528 pass.","shortMessageHtmlLink":"Fix bugs in MultiXact truncation"}},{"before":"c421378a31ea74c0530e6e0f41af51498913a49b","after":"0640e6b24a7eb2b626a7432c0e2801840954429e","ref":"refs/heads/pgstat_aux_file_v15","pushedAt":"2024-06-28T12:55:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Write pgstat file only on shutdown","shortMessageHtmlLink":"Write pgstat file only on shutdown"}},{"before":null,"after":"f54d7373eb0de5a54bce2becdb1c801026c7edff","ref":"refs/heads/cherry-pick-upstream-TruncateMultiXact-fix-v15","pushedAt":"2024-06-28T09:06:19.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Fix bugs in MultiXact truncation\n\n1. TruncateMultiXact() performs the SLRU truncations in a critical\nsection. Deleting the SLRU segments calls ForwardSyncRequest(), which\nwill try to compact the request queue if it's full\n(CompactCheckpointerRequestQueue()). That in turn allocates memory,\nwhich is not allowed in a critical section. Backtrace:\n\n TRAP: failed Assert(\"CritSectionCount == 0 || (context)->allowInCritSection\"), File: \"../src/backend/utils/mmgr/mcxt.c\", Line: 1353, PID: 920981\n postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]\n postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]\n postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]\n postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]\n postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]\n postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]\n postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]\n postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]\n postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]\n postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]\n postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]\n postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]\n postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]\n postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]\n /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]\n /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]\n postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]\n\nTo fix, bail out in CompactCheckpointerRequestQueue() without doing\nanything, if it's called in a critical section. That covers the above\ncall path, as well as any other similar cases where\nRegisterSyncRequest might be called in a critical section.\n\n2. After fixing that, another problem became apparent: Autovacuum\nprocess doing that truncation can deadlock with the checkpointer\nprocess. TruncateMultiXact() sets \"MyProc->delayChkptFlags |=\nDELAY_CHKPT_START\". If the sync request queue is full and cannot be\ncompacted, the process will repeatedly sleep and retry, until there is\nroom in the queue. However, if the checkpointer is trying to start a\ncheckpoint at the same time, and is waiting for the DELAY_CHKPT_START\nprocesses to finish, the queue will never shrink.\n\nMore concretely, the autovacuum process is stuck here:\n\n #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=) at ../src/backend/storage/ipc/latch.c:1570\n #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,\n wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516\n #3 0x000056220b243224 in WaitLatch (latch=, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)\n at ../src/backend/storage/ipc/latch.c:538\n #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614\n #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495\n #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566\n #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=, newOldestOffset=) at ../src/backend/access/transam/multixact.c:3006\n #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201\n #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917\n #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760\n #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550\n #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/autovacuum.c:1569\n\nand the checkpointer is stuck here:\n\n #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50\n #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098\n #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/checkpointer.c:464\n\nTo fix, add AbsorbSyncRequests() to the loops where the checkpointer\nwaits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to\nfinish.\n\nBackpatch to v14. Before that, SLRU deletion didn't call\nRegisterSyncRequest, which avoided this failure. I'm not sure if there\nare other similar scenarios on older versions, but we haven't had\nany such reports.\n\nDiscussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi\n\nNEON: Cherry-picked from the upstream thread ahead of time, to make\nthe new test being added in PR #6528 pass.","shortMessageHtmlLink":"Fix bugs in MultiXact truncation"}},{"before":null,"after":"223dd925959f8124711dd3d867dc8ba6629d52c0","ref":"refs/heads/cherry-pick-upstream-TruncateMultiXact-fix-v14","pushedAt":"2024-06-28T09:06:10.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Fix bugs in MultiXact truncation\n\n1. TruncateMultiXact() performs the SLRU truncations in a critical\nsection. Deleting the SLRU segments calls ForwardSyncRequest(), which\nwill try to compact the request queue if it's full\n(CompactCheckpointerRequestQueue()). That in turn allocates memory,\nwhich is not allowed in a critical section. Backtrace:\n\n TRAP: failed Assert(\"CritSectionCount == 0 || (context)->allowInCritSection\"), File: \"../src/backend/utils/mmgr/mcxt.c\", Line: 1353, PID: 920981\n postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]\n postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]\n postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]\n postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]\n postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]\n postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]\n postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]\n postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]\n postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]\n postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]\n postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]\n postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]\n postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]\n postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]\n /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]\n /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]\n postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]\n\nTo fix, bail out in CompactCheckpointerRequestQueue() without doing\nanything, if it's called in a critical section. That covers the above\ncall path, as well as any other similar cases where\nRegisterSyncRequest might be called in a critical section.\n\n2. After fixing that, another problem became apparent: Autovacuum\nprocess doing that truncation can deadlock with the checkpointer\nprocess. TruncateMultiXact() sets \"MyProc->delayChkptFlags |=\nDELAY_CHKPT_START\". If the sync request queue is full and cannot be\ncompacted, the process will repeatedly sleep and retry, until there is\nroom in the queue. However, if the checkpointer is trying to start a\ncheckpoint at the same time, and is waiting for the DELAY_CHKPT_START\nprocesses to finish, the queue will never shrink.\n\nMore concretely, the autovacuum process is stuck here:\n\n #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=) at ../src/backend/storage/ipc/latch.c:1570\n #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,\n wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516\n #3 0x000056220b243224 in WaitLatch (latch=, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)\n at ../src/backend/storage/ipc/latch.c:538\n #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614\n #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495\n #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566\n #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=, newOldestOffset=) at ../src/backend/access/transam/multixact.c:3006\n #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201\n #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917\n #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760\n #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550\n #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/autovacuum.c:1569\n\nand the checkpointer is stuck here:\n\n #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50\n #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098\n #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/checkpointer.c:464\n\nTo fix, add AbsorbSyncRequests() to the loops where the checkpointer\nwaits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to\nfinish.\n\nBackpatch to v14. Before that, SLRU deletion didn't call\nRegisterSyncRequest, which avoided this failure. I'm not sure if there\nare other similar scenarios on older versions, but we haven't had\nany such reports.\n\nDiscussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi\n\nNEON: Cherry-picked from the upstream thread ahead of time, to make\nthe new test being added in PR #6528 pass.","shortMessageHtmlLink":"Fix bugs in MultiXact truncation"}},{"before":null,"after":"e06bebc75306b583e758b52c95946d41109239b2","ref":"refs/heads/cherry-pick-upstream-TruncateMultiXact-fix-v16","pushedAt":"2024-06-28T09:01:19.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Fix bugs in MultiXact truncation\n\n1. TruncateMultiXact() performs the SLRU truncations in a critical\nsection. Deleting the SLRU segments calls ForwardSyncRequest(), which\nwill try to compact the request queue if it's full\n(CompactCheckpointerRequestQueue()). That in turn allocates memory,\nwhich is not allowed in a critical section. Backtrace:\n\n TRAP: failed Assert(\"CritSectionCount == 0 || (context)->allowInCritSection\"), File: \"../src/backend/utils/mmgr/mcxt.c\", Line: 1353, PID: 920981\n postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]\n postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]\n postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]\n postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]\n postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]\n postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]\n postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]\n postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]\n postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]\n postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]\n postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]\n postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]\n postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]\n postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]\n /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]\n /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]\n postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]\n\nTo fix, bail out in CompactCheckpointerRequestQueue() without doing\nanything, if it's called in a critical section. That covers the above\ncall path, as well as any other similar cases where\nRegisterSyncRequest might be called in a critical section.\n\n2. After fixing that, another problem became apparent: Autovacuum\nprocess doing that truncation can deadlock with the checkpointer\nprocess. TruncateMultiXact() sets \"MyProc->delayChkptFlags |=\nDELAY_CHKPT_START\". If the sync request queue is full and cannot be\ncompacted, the process will repeatedly sleep and retry, until there is\nroom in the queue. However, if the checkpointer is trying to start a\ncheckpoint at the same time, and is waiting for the DELAY_CHKPT_START\nprocesses to finish, the queue will never shrink.\n\nMore concretely, the autovacuum process is stuck here:\n\n #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=) at ../src/backend/storage/ipc/latch.c:1570\n #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,\n wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516\n #3 0x000056220b243224 in WaitLatch (latch=, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)\n at ../src/backend/storage/ipc/latch.c:538\n #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614\n #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495\n #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 , segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566\n #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=, newOldestOffset=) at ../src/backend/access/transam/multixact.c:3006\n #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201\n #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917\n #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760\n #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550\n #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/autovacuum.c:1569\n\nand the checkpointer is stuck here:\n\n #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6\n #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50\n #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098\n #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=, startup_data_len=) at ../src/backend/postmaster/checkpointer.c:464\n\nTo fix, add AbsorbSyncRequests() to the loops where the checkpointer\nwaits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to\nfinish.\n\nBackpatch to v14. Before that, SLRU deletion didn't call\nRegisterSyncRequest, which avoided this failure. I'm not sure if there\nare other similar scenarios on older versions, but we haven't had\nany such reports.\n\nDiscussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi\n\nNEON: Cherry-picked from the upstream thread ahead of time, to make\nthe new test being added in PR #6528 pass.","shortMessageHtmlLink":"Fix bugs in MultiXact truncation"}},{"before":"099b9ffb723600f1b7d31d4b97ff2c8c7103d412","after":"c421378a31ea74c0530e6e0f41af51498913a49b","ref":"refs/heads/pgstat_aux_file_v15","pushedAt":"2024-06-26T19:27:42.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Undo unintended changes","shortMessageHtmlLink":"Undo unintended changes"}},{"before":"c97cdd4cb9c9c225ece8c9d7658b77210d8d0291","after":"9901875cd3a4b80709653998544215323e13ef2f","ref":"refs/heads/pgstat_aux_file_v16","pushedAt":"2024-06-26T19:26:56.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"knizhnik","name":"Konstantin Knizhnik","path":"/knizhnik","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/156922?s=80&v=4"},"commit":{"message":"Undo unintended changes","shortMessageHtmlLink":"Undo unintended changes"}},{"before":"4be432ffb674cdda48e9e84fdd15f6c77293eee8","after":"c704e1b70f35a158ae47b94c4d979ccc6f3fc165","ref":"refs/heads/restore_running_xids_from_clog-take-two-v14","pushedAt":"2024-06-25T16:58:10.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nInstead, call hook","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"5aa881309343235d69054b7c182a7f685bc34e23","after":"42ee50b051e1b8aed6bdcd047ff34d40b5cb1565","ref":"refs/heads/restore_running_xids_from_clog-take-two-v15","pushedAt":"2024-06-25T16:57:30.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nInstead, call hook","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}},{"before":"ec31fd8a07fdd346229c6d91d41606423663da3c","after":"63ee385be85691704d2568aecaeae374138e2809","ref":"refs/heads/restore_running_xids_from_clog-take-two-v16","pushedAt":"2024-06-25T16:56:02.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hlinnaka","name":"Heikki Linnakangas","path":"/hlinnaka","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/191602?s=80&v=4"},"commit":{"message":"Don't set wasShutdown=true when starting a read replica\n\nThat led to incorrect query results, because the known-assigned XIDs\nmachinery was initialized incorrectly thinking that all of the\nin-progress transactions were aborted.\n\nIntroduce a new hook, to allow the neon extension to restore\nrunning-xacts from the CLOG.","shortMessageHtmlLink":"Don't set wasShutdown=true when starting a read replica"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEdfegnAA","startCursor":null,"endCursor":null}},"title":"Activity ยท neondatabase/postgres"}