Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v14] Assigning to GUCs leads to server crash #45

Open
ocean-dot-li opened this issue Dec 25, 2024 · 5 comments
Open

[v14] Assigning to GUCs leads to server crash #45

ocean-dot-li opened this issue Dec 25, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@ocean-dot-li
Copy link

Hello,
When using the pg_show_plans plugin, after I modified the pg_show_plans.id_enabled parameter, I noticed that some core dump files were generated. The specific steps to reproduce and the stack trace are provided in the additional information.

The process generating the core dump is not a regular backend process, but rather the SysLogger process. Based on the stack trace information, AFAICS, the SysLogger is not attached to shared memory (refer to the SubPostmasterMain function). Therefore, when executing the pg_show_plans.id_enabled set_hook function, the segmentation fault occurs. Similar parameters include pg_show_plans.plan_format.

Let's return to the discussion of the functionality of these two parameters. The parameters allow users to modify them at the session level, but they modify shared memory without any lock protection, which is unreasonable. We might consider changing the parameters to the PGC_POSTMASTER level. However, in my opinion, it would be more reasonable if these were just regular parameters. I've included the implemented patch in the 0001-redesign-pg_show_plans.is_enabled-and-pg_show_plans..patch.

Regards

additional info:

  • PostgreSQL version
    14.13
  • Which way did you install PostgreSQL, sources or binaries?
    sources
  • Operating system
    CentOS8
  • All the other extensions that you use.
    only pg_show_plans
  • shared_preload_libraries value from postgresql.conf
    pg_show_plans
  • Can you reproduce your issue on a clean cluster? If yes, state exactly how.
  1. alter system set pg_show_plans.is_enabled to off when instance running.
  2. select pg_reload_conf()
    Then, a coredump file is produced.
  • Does the server crash? Try to get a stack trace.
    the stack trace is
(gdb) bt
#0  0x00007fafd3d85055 in raise () from /lib64/libpthread.so.0
#1  <signal handler called>
#2  set_state (state=false, extra=0x0) at pg_show_plans.c:325
#3  0x0000000000af7fad in set_config_option (name=0x2cceff0 "pg_show_plans.is_enabled", value=<optimized out>, context=context@entry=PGC_SIGHUP, source=source@entry=PGC_S_FILE,
    action=action@entry=GUC_ACTION_SET, changeVal=<optimized out>, changeVal@entry=true, elevel=12, is_reload=false) at guc.c:14669
#4  0x0000000000afebc5 in ProcessConfigFileInternal (context=context@entry=PGC_SIGHUP, applySettings=applySettings@entry=true, elevel=elevel@entry=13) at guc-file.l:572
#5  0x0000000000aff5af in ProcessConfigFile (context=PGC_SIGHUP) at guc-file.l:162
#6  0x00000000008a55a3 in SysLoggerMain (argc=1, argv=<synthetic pointer>) at syslogger.c:463
#7  SysLogger_Start (loggerIndex=<optimized out>) at syslogger.c:877
#8  0x00000000008a318e in PostmasterMain (argc=4, argv=0x2c651a0) at postmaster.c:1248
#9  0x00000000004ca07c in main (argc=4, argv=0x2c651a0) at main.c:216
@kovmir
Copy link
Collaborator

kovmir commented Dec 25, 2024

Thank you for providing all the details upfront. I can reproduce the bug on 14, but not on 17.

@kovmir kovmir self-assigned this Dec 25, 2024
@kovmir kovmir added the bug Something isn't working label Dec 25, 2024
@kovmir
Copy link
Collaborator

kovmir commented Dec 25, 2024

Regarding your patch... The functions you propose to exclude propagate the setting to shared memory, so that changing these affects all the clients. In other words, it affects the behavior of the extension, rather than seamlessly fixing the problem.

@kovmir kovmir changed the title alter system set and reload pg_show_plans.is_enabled or pg_show_plans.plan_format parameter can cause coredump [v14] Assigning to GUCs leads to server crash Dec 25, 2024
@ocean-dot-li
Copy link
Author

ocean-dot-li commented Dec 26, 2024

Regarding your patch... The functions you propose to exclude propagate the setting to shared memory, so that changing these affects all the clients. In other words, it affects the behavior of the extension, rather than seamlessly fixing the problem.

Indeed, the patch I proposed does not directly fix this issue. This patch is a redesign of the implementation of pg_show_plans.is_enabled and pg_show_plans.plan_format. As I mentioned in the original email, we may be reconsider the implementation of these two parameters. I believe it is unreasonable to have enable and format designed as global settings; instead, it would be more appropriate to control them independently per backend. This approach would prevent modifications to shared memory without lock protection and still maintain the logic of control.

What do you think about this proposal? I look forward to hearing your thoughts.

Best regards!

@kovmir
Copy link
Collaborator

kovmir commented Dec 26, 2024

This extension is for debugging purposes, to identify queries causing the troubles. As the troubles show up, you connect to the database and see all the currently running statements. Not seeing all of them, or not knowing whether you see all of them would make the extension less useful. That is why is_enabled cannot be a local setting.

The query plans appear in shared memory just before being executed in the format set in plan_format. If plan_format becomes local, SELECT * FROM pg_show_plans; will yield plans in mixed format for everyone. That is why plan_format cannot be a local setting.

@ocean-dot-li
Copy link
Author

ocean-dot-li commented Dec 27, 2024

This extension is for debugging purposes, to identify queries causing the troubles. As the troubles show up, you connect to the database and see all the currently running statements. Not seeing all of them, or not knowing whether you see all of them would make the extension less useful. That is why is_enabled cannot be a local setting.

I got your point. The scenario you described is that, when it's not clear which is trouble query, the extension displays plans of all running statements to identify the trouble one. However, changing is_enabled to a local setting would not significantly affect this behavior, as the global setting can still be managed using ALTER SYSTEM. It just makes the process a bit more cumbersome compared to using SET.

Another meaningful scenario for this extension is when we already know which is trouble query, but for various reasons, we cannot obtain its actual plan. For example, if we are unsure about the parameter types set by a query using extended protocol, the EXPLAIN command maybe not reproduce the issue. Meanwhile, the query runs for a long time without producing results, the auto_explain extension can't help either. In such cases, pg_show_plans becomes the last resort (a common scenario in cloud database operations).

For a database used in actual business operations, pg_show_plans is unlikely to be enabled by default due to the performance overhead it incurs. If is_enabled is a shared setting, it would affect all queries when is_enabled is true. However, if is_enabled is a local setting, you could execute SET before running the trouble query, affecting only one process. Furthermore, if used together with pg_hint_plan extension, the impact could be limited to a single query. Of course, this would also fix the original issue.

By the way, there are more effective ways to identify trouble queries, such as using the pg_stat_activity view and the pg_stat_statements extension. Determining trouble queries by examining actual plans is not straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants