Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

23.8.16 Backport PR #63685 Fix SIGSEGV due to CPU/Real profiler #453

Conversation

shiyer7474
Copy link

Original PR :
ClickHouse#63865

The problem was due to incorrect unwinding due from signal handlers, which leads to incorrect DWARF (FDE/CIE) interpretation.

After this patch I was not able to reproduce the crash for couple of hours, while before it was very stable (I've reduced the minimal threshold for query_profiler_real_time_period_ns), using simply:

$ clickhouse-benchmark --port 19000 -q "SELECT * FROM remote('127.{1..10}', system, one)" --query_profiler_real_time_period_ns=1

Note, I'm using here remote() for fibers, that has stack with guard pages that helps with reproducing the crash more faster.

P.S. I also have another implementation of this fix, without patching unwind and using info from signal context directly, and even though it is better, because you don't need to trip extra frames and you can use all the 45 frames for something useful, it is too complex, so let's go with a simpler patch first, and I think it could be even backported.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix SIGSEGV due to CPU/Real profiler

@altinity-robot
Copy link
Collaborator

altinity-robot commented Sep 2, 2024

This is an automated comment for commit 8c4a3e6 with description of existing statuses. It's updated for the latest CI running

⏳ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub✅ success
Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process⏳ pending

The problem was due to incorrect unwinding due from signal handlers,
which leads to incorrect DWARF (FDE/CIE) interpretation.

After this patch I was not able to reproduce the crash for couple of
hours, while before it was very stable (I've reduced the minimal
threshold for query_profiler_real_time_period_ns), using simply:

    $ clickhouse-benchmark --port 19000 -q "SELECT * FROM remote('127.{1..10}', system, one)" --query_profiler_real_time_period_ns=1

Note, I'm using here remote() for fibers, that has stack with guard
pages that helps with reproducing the crash more faster.

P.S. I also have another implementation of this fix, without patching
unwind and using info from signal context directly, and even though it
is better, because you don't need to trip extra frames and you can use
all the 45 frames for something useful, it is too complex, so let's go
with a simpler patch first, and I think it could be even backported.

Signed-off-by: Azat Khuzhin <[email protected]>
@Enmk Enmk force-pushed the backports/23.8.16/63685_fix_sigsegv_due_to_cpu_profiler branch from eaa7d2e to 8c4a3e6 Compare September 12, 2024 05:55
Copy link
Member

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails to build:

git -C "/home/ubuntu/_work/ClickHouse/ClickHouse" submodule update --depth="1" --single-branch contrib/libunwind
fatal: remote error: upload-pack: not our ref ee8d196055838982893a8105415eee1b9f25b940
fatal: Fetched in submodule path 'contrib/libunwind', but it did not contain ee8d196055838982893a8105415eee1b9f25b940. Direct fetching of that commit failed.
fatal: 
Error: Process completed with exit code 123.

Also I wasn't able to find commit ee8d196055838982893a8105415eee1b9f25b940 in https://github.com/ClickHouse/libunwind is this a correct commit hash?

@Enmk
Copy link
Member

Enmk commented Sep 17, 2024

Was done in #466

@Enmk Enmk closed this Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants