Skip to content

pyroscope.ebpf: Retry after eBPF errors and buffer config updates.#3381

Merged
simonswine merged 5 commits intomainfrom
20250416_pyroscope.ebpf-error-propagation
May 23, 2025
Merged

pyroscope.ebpf: Retry after eBPF errors and buffer config updates.#3381
simonswine merged 5 commits intomainfrom
20250416_pyroscope.ebpf-error-propagation

Conversation

@simonswine
Copy link
Copy Markdown
Contributor

@simonswine simonswine commented Apr 17, 2025

If the eBPF component encounters an error, it stops the main processing loops, which then caused issues because any config reload for alloy would block the whole agent.

Fixes #3332

Also for long running scrapes we were blocking the component evaluation as we are waiting for the config to be updated. I switched this to be a buffered channel, so the config updated are unblocked. In case of multiple updated buffered it will skip the in between updates and go straight to the latest version.

Fixes #1371

@simonswine simonswine added the area/pyroscope Issues/PRs primarly affecting `pyroscope.` components label Apr 17, 2025
@simonswine simonswine force-pushed the 20250416_pyroscope.ebpf-error-propagation branch from 312558c to 9e1854d Compare April 17, 2025 13:01
@simonswine simonswine changed the title fix: Keep ebpf component running pyroscope.ebpf: Retry after eBPF errors and buffer config updates. Keep ebpf component running Apr 17, 2025
@simonswine simonswine changed the title pyroscope.ebpf: Retry after eBPF errors and buffer config updates. Keep ebpf component running pyroscope.ebpf: Retry after eBPF errors and buffer config updates. Apr 17, 2025
Comment thread internal/component/pyroscope/ebpf/ebpf_linux.go
@simonswine
Copy link
Copy Markdown
Contributor Author

@JohanLindvall wonder if this fixes your problem and/or gives you a better error message?

@JohanLindvall
Copy link
Copy Markdown

@JohanLindvall wonder if this fixes your problem and/or gives you a better error message?

@simonswine With these changes our Alloy Pyroscope scrapers have been running for a few hours without any issues.

I have not seen any of the new error messages logged.

@github-actions
Copy link
Copy Markdown
Contributor

This PR has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If you do not have enough time to follow up on this PR or you think it's no longer relevant, consider closing it.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your PR will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

If the eBPF component encounter an error, it stops the main processing
loops, which then caused issues because any config reload for alloy
would block the whole agent.

Fixes #3332

Also for long running scrapes we were blocking the component evaluation
as we are waiting for the config to be updated. I switched this to be a
buffered channel, so the config updated are unblocked.

In case of multiple updated buffered it will skip the in between updates
and go straight to the latest version.

Fixes #1371
@simonswine simonswine force-pushed the 20250416_pyroscope.ebpf-error-propagation branch from 9e1854d to 19591e0 Compare May 23, 2025 12:14
@simonswine
Copy link
Copy Markdown
Contributor Author

I have built an image with the changes from this PR: In case someone wants to try: docker.io/simonswine/alloy:retry-ebpf-loop-v2@sha256:d914e10e780f1ecb60f94134389a26b6565dea3456e13ca3e83293c2502bb5a8

@simonswine simonswine force-pushed the 20250416_pyroscope.ebpf-error-propagation branch from 19591e0 to 8d79504 Compare May 23, 2025 12:24
@simonswine simonswine marked this pull request as ready for review May 23, 2025 13:08
@simonswine simonswine requested a review from a team as a code owner May 23, 2025 13:08
Copy link
Copy Markdown
Contributor

@aleks-p aleks-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I noticed the added error logging which is great. I would consider adding Info and Debug logs in a few select places to better understand what happens during important state transitions.

@simonswine simonswine added bug Something isn't working backport release/v1.8 labels May 23, 2025
@simonswine simonswine merged commit 9c21229 into main May 23, 2025
40 checks passed
@simonswine simonswine deleted the 20250416_pyroscope.ebpf-error-propagation branch May 23, 2025 16:54
@github-actions
Copy link
Copy Markdown
Contributor

The backport to release/v1.8 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-3381-to-release/v1.8 origin/release/v1.8
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 9c21229be370614d21f9cc4251aae85b4e689cba
# Push it to GitHub
git push --set-upstream origin backport-3381-to-release/v1.8
git switch main
# Remove the local backport branch
git branch -D backport-3381-to-release/v1.8

Then, create a pull request where the base branch is release/v1.8 and the compare/head branch is backport-3381-to-release/v1.8.

simonswine added a commit to simonswine/alloy that referenced this pull request May 23, 2025
…rafana#3381)

* fix: Keep ebpf component loop running

If the eBPF component encounter an error, it stops the main processing
loops, which then caused issues because any config reload for alloy
would block the whole agent.

Fixes grafana#3332

Also for long running scrapes we were blocking the component evaluation
as we are waiting for the config to be updated. I switched this to be a
buffered channel, so the config updated are unblocked.

In case of multiple updated buffered it will skip the in between updates
and go straight to the latest version.

Fixes grafana#1371

* Fix data racec in test

* Limit tries to start an ebpf session

* Add info logs at session start/end

* Pass context around to shutdown properly

(cherry picked from commit 9c21229)
simonswine added a commit to simonswine/alloy that referenced this pull request May 23, 2025
…rafana#3381)

* fix: Keep ebpf component loop running

If the eBPF component encounter an error, it stops the main processing
loops, which then caused issues because any config reload for alloy
would block the whole agent.

Fixes grafana#3332

Also for long running scrapes we were blocking the component evaluation
as we are waiting for the config to be updated. I switched this to be a
buffered channel, so the config updated are unblocked.

In case of multiple updated buffered it will skip the in between updates
and go straight to the latest version.

Fixes grafana#1371

* Fix data racec in test

* Limit tries to start an ebpf session

* Add info logs at session start/end

* Pass context around to shutdown properly

(cherry picked from commit 9c21229)
wildum pushed a commit that referenced this pull request May 26, 2025
…3381)

* fix: Keep ebpf component loop running

If the eBPF component encounter an error, it stops the main processing
loops, which then caused issues because any config reload for alloy
would block the whole agent.

Fixes #3332

Also for long running scrapes we were blocking the component evaluation
as we are waiting for the config to be updated. I switched this to be a
buffered channel, so the config updated are unblocked.

In case of multiple updated buffered it will skip the in between updates
and go straight to the latest version.

Fixes #1371

* Fix data racec in test

* Limit tries to start an ebpf session

* Add info logs at session start/end

* Pass context around to shutdown properly
wildum added a commit that referenced this pull request May 26, 2025
* pyroscope.ebpf: Retry after eBPF errors and buffer config updates. (#3381)

* fix: Keep ebpf component loop running

If the eBPF component encounter an error, it stops the main processing
loops, which then caused issues because any config reload for alloy
would block the whole agent.

Fixes #3332

Also for long running scrapes we were blocking the component evaluation
as we are waiting for the config to be updated. I switched this to be a
buffered channel, so the config updated are unblocked.

In case of multiple updated buffered it will skip the in between updates
and go straight to the latest version.

Fixes #1371

* Fix data racec in test

* Limit tries to start an ebpf session

* Add info logs at session start/end

* Pass context around to shutdown properly

* fix(pyroscope.scrape): godeltaprof scraping (#3542)

* fix(pyroscope.scrape): godeltaprof scraping

* cl

* update changelog

---------

Co-authored-by: Christian Simon <simon@swine.de>
Co-authored-by: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jun 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area/pyroscope Issues/PRs primarly affecting `pyroscope.` components backport release/v1.8 backport-failed bug Something isn't working frozen-due-to-age needs-attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[pyroscope.ebpf] Pyroscope collection stops working pyroscope.ebpf slow component evaluation

3 participants