Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reactor: more info, robustness on segfault #2691

Closed
wants to merge 1 commit into from

Conversation

travisdowns
Copy link
Contributor

On segfault we execute a handler that provides information including a backtrace. This currently emits all information in a single write call after collecting it in a buffer. If anything goes wrong, e.g., the backtrace() call itself crashes, then no information will be emitted. The backtrace() call is not signal safe in theory, and in practice the situation seems mixed as to its safety. So it not unlikely that situations may arise where no output can be emitted on SIGSEGV.

Because we catch the signal and then re-raise it using pthread_kill, the specific information about the IP is lost in re-raise: this prevents the line in syslog which usually captures information about segfaults from appearing at all. So we may be left without useful information after a crash.

In this change, we emit additional information before the backtrace() which is not likely to have any problem, and we emit each as separate write(2) calls so if there is a failure at any point we at least have the information emitted up to that point.

After this, the start of the output on segfault looks like so:

Segmentation fault, si_pid: 0, si_addr: 0000000000000000, ip: 0000579ba959751f
Segmentation fault resolved ip: 0000000005e2751f in [0000579ba3770000+000000000e3f98d8] 
Segmentation fault on shard 0, in scheduling group admin.
Backtrace:
 ....

@travisdowns travisdowns marked this pull request as draft March 20, 2025 17:46
@travisdowns travisdowns marked this pull request as ready for review March 20, 2025 18:13
print_zero_padded_hex_safe(f.so->end - f.so->begin);
print_safe("]");

print_with_backtrace("\nSegmentation fault");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not append this \n to the previous line? Like print_safe("]\n");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye, done.

On segfault we execute a handler that provides information including
a backtrace. This currently emits all information in a single write
call after collecting it in a buffer. If anything goes wrong, e.g.,
the backtrace() call itself crashes, then no information will be
emitted. The backtrace() call is not signal safe in theory, and in
practice the situation seems mixed as to its safety. So it not
unlikely that situations may arise where no output can be emitted on
SIGSEGV.

Because we catch the signal and then re-raise it using pthread_kill, the
specific information about the IP is lost in re-raise: this prevents the
line in syslog which usually captures information about segfaults from
appearing at all. So we may be left without useful information after a
crash.

In this change, we emit additional information before the backtrace()
which is not likely to have any problem, and we emit each as separate
write(2) calls so if there is a failure at any point we at least have
the information emitted up to that point.

After this, the start of the output on segfault looks like so:

Segmentation fault, si_pid: 0, si_addr: 0000000000000000, ip: 0000579ba959751f
Segmentation fault resolved ip: 0000000005e2751f in [0000579ba3770000+000000000e3f98d8]
Segmentation fault on shard 0, in scheduling group admin.

Followed by the backtrace.
@xemul xemul closed this in 320f13a Mar 26, 2025
@travisdowns
Copy link
Contributor Author

Thanks for the merge @xemul. This was related to #2697, and I have a follow on PR to add more robustness to the fault handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants