Description
Hello,
When using the pg_show_plans
plugin, after I modified the pg_show_plans.id_enabled
parameter, I noticed that some core dump files were generated. The specific steps to reproduce and the stack trace are provided in the additional information.
The process generating the core dump is not a regular backend process, but rather the SysLogger process. Based on the stack trace information, AFAICS, the SysLogger is not attached to shared memory (refer to the SubPostmasterMain
function). Therefore, when executing the pg_show_plans.id_enabled
set_hook function, the segmentation fault occurs. Similar parameters include pg_show_plans.plan_format
.
Let's return to the discussion of the functionality of these two parameters. The parameters allow users to modify them at the session level, but they modify shared memory without any lock protection, which is unreasonable. We might consider changing the parameters to the PGC_POSTMASTER level. However, in my opinion, it would be more reasonable if these were just regular parameters. I've included the implemented patch in the 0001-redesign-pg_show_plans.is_enabled-and-pg_show_plans..patch.
Regards
additional info:
- PostgreSQL version
14.13 - Which way did you install PostgreSQL, sources or binaries?
sources - Operating system
CentOS8 - All the other extensions that you use.
only pg_show_plans shared_preload_libraries
value frompostgresql.conf
pg_show_plans
- Can you reproduce your issue on a clean cluster? If yes, state exactly how.
alter system set pg_show_plans.is_enabled to off
when instance running.select pg_reload_conf()
Then, a coredump file is produced.
- Does the server crash? Try to get a stack trace.
the stack trace is
(gdb) bt
#0 0x00007fafd3d85055 in raise () from /lib64/libpthread.so.0
#1 <signal handler called>
#2 set_state (state=false, extra=0x0) at pg_show_plans.c:325
#3 0x0000000000af7fad in set_config_option (name=0x2cceff0 "pg_show_plans.is_enabled", value=<optimized out>, context=context@entry=PGC_SIGHUP, source=source@entry=PGC_S_FILE,
action=action@entry=GUC_ACTION_SET, changeVal=<optimized out>, changeVal@entry=true, elevel=12, is_reload=false) at guc.c:14669
#4 0x0000000000afebc5 in ProcessConfigFileInternal (context=context@entry=PGC_SIGHUP, applySettings=applySettings@entry=true, elevel=elevel@entry=13) at guc-file.l:572
#5 0x0000000000aff5af in ProcessConfigFile (context=PGC_SIGHUP) at guc-file.l:162
#6 0x00000000008a55a3 in SysLoggerMain (argc=1, argv=<synthetic pointer>) at syslogger.c:463
#7 SysLogger_Start (loggerIndex=<optimized out>) at syslogger.c:877
#8 0x00000000008a318e in PostmasterMain (argc=4, argv=0x2c651a0) at postmaster.c:1248
#9 0x00000000004ca07c in main (argc=4, argv=0x2c651a0) at main.c:216
Activity
kovmir commentedon Dec 25, 2024
Thank you for providing all the details upfront. I can reproduce the bug on 14, but not on 17.
kovmir commentedon Dec 25, 2024
Regarding your patch... The functions you propose to exclude propagate the setting to shared memory, so that changing these affects all the clients. In other words, it affects the behavior of the extension, rather than seamlessly fixing the problem.
[-]alter system set and reload pg_show_plans.is_enabled or pg_show_plans.plan_format parameter can cause coredump[/-][+][v14] Assigning to GUCs leads to server crash[/+]ocean-dot-li commentedon Dec 26, 2024
Indeed, the patch I proposed does not directly fix this issue. This patch is a redesign of the implementation of
pg_show_plans.is_enabled
andpg_show_plans.plan_format
. As I mentioned in the original email, we may be reconsider the implementation of these two parameters. I believe it is unreasonable to haveenable
andformat
designed as global settings; instead, it would be more appropriate to control them independently per backend. This approach would prevent modifications to shared memory without lock protection and still maintain the logic of control.What do you think about this proposal? I look forward to hearing your thoughts.
Best regards!
kovmir commentedon Dec 26, 2024
This extension is for debugging purposes, to identify queries causing the troubles. As the troubles show up, you connect to the database and see all the currently running statements. Not seeing all of them, or not knowing whether you see all of them would make the extension less useful. That is why
is_enabled
cannot be a local setting.The query plans appear in shared memory just before being executed in the format set in
plan_format
. Ifplan_format
becomes local,SELECT * FROM pg_show_plans;
will yield plans in mixed format for everyone. That is whyplan_format
cannot be a local setting.ocean-dot-li commentedon Dec 27, 2024
I got your point. The scenario you described is that, when it's not clear which is trouble query, the extension displays plans of all running statements to identify the trouble one. However, changing
is_enabled
to a local setting would not significantly affect this behavior, as the global setting can still be managed usingALTER SYSTEM
. It just makes the process a bit more cumbersome compared to usingSET
.Another meaningful scenario for this extension is when we already know which is trouble query, but for various reasons, we cannot obtain its actual plan. For example, if we are unsure about the parameter types set by a query using extended protocol, the
EXPLAIN
command maybe not reproduce the issue. Meanwhile, the query runs for a long time without producing results, theauto_explain
extension can't help either. In such cases,pg_show_plans
becomes the last resort (a common scenario in cloud database operations).For a database used in actual business operations,
pg_show_plans
is unlikely to be enabled by default due to the performance overhead it incurs. Ifis_enabled
is a shared setting, it would affect all queries whenis_enabled
is true. However, ifis_enabled
is a local setting, you could executeSET
before running the trouble query, affecting only one process. Furthermore, if used together withpg_hint_plan
extension, the impact could be limited to a single query. Of course, this would also fix the original issue.By the way, there are more effective ways to identify trouble queries, such as using the
pg_stat_activity
view and thepg_stat_statements
extension. Determining trouble queries by examining actual plans is not straightforward.kovmir commentedon Feb 11, 2025
Fixed.