Skip to content

fix panic on state refresh #26471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jun 19, 2025

In order to use parallel.Enqueue() it is required to call parallel.SetMaxThreads() first. However in our main call we have been doing this after we setup the initial runtime so just move this up. And while at it move up the cpu and memory profile setup as well so we can capture the earlier parts as well.

This was most likely introduced by commit 46d874a ("Refactor graph traversal & use for pod stop") which started using parallel.Enqueue() in removePod() which then can get called from refresh() when a container has autoremoval configured.

I tried many hard resets in VMs to reproduce but was unable to do so. I always got "retrieving temporary directory for container xxx: no such container" erros instead and it failed to autoremove but no panics. Besides that many times c/storage was corrupted which made the image I used unusable and it had to be deleted which is concerning in itself.

Fixes #26469

Does this PR introduce a user-facing change?

Fixed a possible panic on state refresh after boot.

In order to use parallel.Enqueue() it is required to call
parallel.SetMaxThreads() first. However in our main call we have been
doing this after we setup the initial runtime so just move this up.
And while at it move up the cpu and memory profile setup as well so we
can capture the earlier parts as well.

This was most likely introduced by commit 46d874a ("Refactor graph
traversal & use for pod stop") which started using parallel.Enqueue() in
removePod() which then can get called from refresh() when a container
has autoremoval configured.

I tried many hard resets in VMs to reproduce but was unable to do so.
I always got "retrieving temporary directory for container xxx: no such
container" erros instead and it failed to autoremove but no panics.
Besides that many times c/storage was corrupted which made the image I
used unusable and it had to be deleted which is concerning in itself.

Fixes containers#26469

Signed-off-by: Paul Holzinger <[email protected]>
Copy link
Contributor

openshift-ci bot commented Jun 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2025
@Luap99 Luap99 added the No New Tests Allow PR to proceed without adding regression tests label Jun 19, 2025
Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@mheon
Copy link
Member

mheon commented Jun 20, 2025

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. No New Tests Allow PR to proceed without adding regression tests release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Podman 5.5.1-1 segfaults when dealing with lockfiles
2 participants