23.8.16 Backport PR #63685 Fix SIGSEGV due to CPU/Real profiler #453
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Original PR :
ClickHouse#63865
The problem was due to incorrect unwinding due from signal handlers, which leads to incorrect DWARF (FDE/CIE) interpretation.
After this patch I was not able to reproduce the crash for couple of hours, while before it was very stable (I've reduced the minimal threshold for query_profiler_real_time_period_ns), using simply:
Note, I'm using here remote() for fibers, that has stack with guard pages that helps with reproducing the crash more faster.
P.S. I also have another implementation of this fix, without patching unwind and using info from signal context directly, and even though it is better, because you don't need to trip extra frames and you can use all the 45 frames for something useful, it is too complex, so let's go with a simpler patch first, and I think it could be even backported.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix SIGSEGV due to CPU/Real profiler