Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When checking IsDangling make sure image is not in manifest list #2360

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Mar 11, 2025

Currently when we run podman image prune or podman images --filter
dangling

It is pruning images that are in a local manifest. These images are
not dangling because they are currently in use by a named manifest list.

You can create this situation simply by doing

echo "from scratch" > /tmp/Containerfile
id=$(podman build /tmp)
podman manifest create test $id
podman image prune --force
podman image exists $id

Will return an error since the image was pruned. Now the local manifest
test is broken.

Signed-off-by: Daniel J Walsh [email protected]<!--- Please read the contributing guidelines before proceeding --->

Copy link
Contributor

openshift-ci bot commented Mar 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member Author

rhatdan commented Mar 11, 2025

@nalind @Luap99 @mheon @mtrmac PTAL

@ericcurtin This is needed for OCI images in RamaLama. Currently if you convert a model to an OCI Image it creates an manifestlist pointing to the untagged built image, but if the user does a podman image prune the Model image is removed leaving a manifest list that can not be used. It will have a pointer to a sh256:DIGEST that no longer exists.

return false, err
}

for _, image := range allImages {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned by how expensive this is going to make the dangling check. It feels like there should be a better way of checking manifest list membership, but maybe that's a c/storage RFE for when we refactor there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the libimage/manifests code has, sadly?, been implemented here in c/common, in a way that makes it ~invisible to c/storage.


But, yes, this is not good at all. isDangling is called from ListImages. So:

  • With SetListData, this is an infinite recursion.
  • and, anyway, this is O(images^2) at least.

This needs to be implemented in a way where “all images” are only processed once (well, ideally it should be an O(1) query, but I don’t think we have that data), to build a data structure which allows isDangling to be executed in, at worst O(1) I/O if not O(1) CPU.

And then I suspect that removeRecursive might want to call this internal version, with the data structure pre-built, instead of the public IsDangling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is O(n^n) which should not be done, basically the code does a list images then filters (which calls isDangling() on each image which then should not in turn call list again all images for each image)

It should properly build a map of all images used in manifests and then pass that map around so the dangling filter can do a quick check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… Oops, not an infinite recursion, because the recursive calls don’t have SetListData set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right n^2 not n^n, my bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to comment the same above:

⚠️ listing all images is expensive, especially when doing that for potentially all images. That's why the code uses a layer tree to operate on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably faster to first compute a (reverse) mapping from manifest lists to images and just do look ups as Paul suggested.

return false, err
}

for _, image := range allImages {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the libimage/manifests code has, sadly?, been implemented here in c/common, in a way that makes it ~invisible to c/storage.


But, yes, this is not good at all. isDangling is called from ListImages. So:

  • With SetListData, this is an infinite recursion.
  • and, anyway, this is O(images^2) at least.

This needs to be implemented in a way where “all images” are only processed once (well, ideally it should be an O(1) query, but I don’t think we have that data), to build a data structure which allows isDangling to be executed in, at worst O(1) I/O if not O(1) CPU.

And then I suspect that removeRecursive might want to call this internal version, with the data structure pre-built, instead of the public IsDangling.

@mtrmac
Copy link
Contributor

mtrmac commented Mar 11, 2025

BTW I don’t understand the motivating situation (is there, you know, a reproducer?), I don’t much understand the libimage/manifests code (something to fix on my side, sure) — so the above is only pointing out specific issues, it’s not an endorsement of the approach in general.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a more verbose commit message explaining the reasoning. I think the current behavior follows the man pages and docs. I'd love to see tests here and for Podman as well (along with man page changes) if we go forward.

@rhatdan
Copy link
Member Author

rhatdan commented Mar 12, 2025

I planned on adding the tests to podman.

The easiest test is to do pseudo code.

cat "from scratch" > /tmp/Containerfile
id = $(podman build /tmp)
podman manifest create NAME $id
podman image prune --force
podman image exist $id
podman manifest rm NAME
podman image prune --force
podman image exist $id

Last line should fail.

@mtrmac
Copy link
Contributor

mtrmac commented Mar 12, 2025

I think the current behavior follows the man pages and docs.

It might well be the case that we want the “dangling” filter (exposed in the UI) unchanged but podman rm updated. I don’t know. And I also don’t know where that leaves the public IsDangling API, it might go either way, or maybe we should replace it with something better-defined.

@rhatdan rhatdan force-pushed the dangling branch 2 times, most recently from 189032d to 00b7490 Compare March 12, 2025 18:05
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick skim.

@@ -22,6 +23,8 @@ type layerTree struct {
// emptyImages do not have any top-layer so we cannot create a
// *layerNode for them.
emptyImages []*Image

manifestDigests map[digest.Digest]bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document keys/values. And if it is effectively a set, make the values struct{}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are after here. This is just a mapping of digests that are used in the manifest lists in containers/storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping of what, to what?

Yes, you know that. I think I know that, now. Please write that down, to save future readers time so that they can decide just reading the data structure whether this data is relevant to them.

E.g. the keys are not the digests of all manifests of all images in the layer tree. Neither are the keys the digests of all manifest lists that exist in the storage (they are not digests of manifest lists at all ; also, for an ordinary podman pull, we typically store one per-platform manifest, and the parent manifest list, in c/storage, along with the pulled image, and those usually (?) have a top layer).

(And it turns out “to what” has the answer “always exactly true”, so it doesn’t need to be a boolean; a struct{} works just as well, when we are only after the presence of the key in the map.)

@@ -124,6 +128,15 @@ func (r *Runtime) newLayerTreeFromData(images []*Image, layers []storage.Layer)
topLayer := img.TopLayer()
if topLayer == "" {
tree.emptyImages = append(tree.emptyImages, img)
if manifestList, _ := img.IsManifestList(context.TODO()); manifestList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to do this for every layer tree? Some (e.g. for Image.Parent) are one-time and one-purpose only.

(I don’t know whether this would be an option to layerTree, or a separate data structure parallel to layerTree.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want @vrothberg to comment on this. We can develop a different structure and process it separately or rename this structure to a different name. I read the tree as a place to cache processed data.

Copy link
Member

@vrothberg vrothberg Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below:

Let's make sure to benchmark on a system with 100,200,400,800 images and if there is a noticeable difference.

I think it makes sense to only compute this data when needed and cache it. Using the layerTree for that purpose looks attractive as it's plumbed to cache/reset data. But the manifest digests could be queried via a function that either computes it fresh or returns the cached map?

@rhatdan rhatdan force-pushed the dangling branch 2 times, most recently from d534dbd to 4cdd485 Compare March 12, 2025 18:20
@rhatdan
Copy link
Member Author

rhatdan commented Mar 12, 2025

@nalind @Luap99 @mheon @mtrmac PTANL

I hopefully took all of your feedback. I will not open a [WIP] PR against podman to use this PR and write tests.

rhatdan added a commit to rhatdan/podman that referenced this pull request Mar 12, 2025
Vendor in latest containers/common

Currently if you create an un tagged image and add it to a manifest
list, podman image prune will remove the image, and leave you with
a broken manifest list. This PR removes the image from dangling in
this situation and prevents the pruning.

We have seen this trigger issues with RamaLama which is creating
manifest lists in this manner, and then users pruning the images.

Verifing: containers/common#2360

Signed-off-by: Daniel J Walsh <[email protected]>
@rhatdan
Copy link
Member Author

rhatdan commented Mar 12, 2025

containers/podman#25562

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about the performance impacts. The tango for computing and operating on the layer tree addressed some serious bottlenecks for image operations.

The manifest-list checks may have a negative impact but I am unable to guess how much.

Let's make sure to benchmark on a system with 100,200,400,800 images and if there is a noticeable difference.

@@ -22,6 +23,8 @@ type layerTree struct {
// emptyImages do not have any top-layer so we cannot create a
// *layerNode for them.
emptyImages []*Image

manifestListDigests map[digest.Digest]bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhatdan, can you add a comment here describing map and how it should be used?

@@ -677,3 +678,12 @@ func (l *list) Instances() []digest.Digest {
}
return instances
}

func (l *list) Digests() []digest.Digest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment here as well to make sure callers know which digests are returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Adding comments, but I notice:

// Instances returns the list of image instances mentioned in this list.
func (l *list) Instances() []digest.Digest {
	instances := make([]digest.Digest, 0, len(l.oci.Manifests))
	for _, instance := range l.oci.Manifests {
		instances = append(instances, instance.Digest)
	}
	return instances
}// Digests returns list of all image digests referenced in the manifestList
func (l *list) Digests() []digest.Digest {
	digests := make([]digest.Digest, 0)	for i := range l.docker.Manifests {
		digests = append(digests, l.docker.Manifests[i].Digest)
	}
	return digests
}

I added the second function. Is there a difference between these two? IE a l.oci.Manifests and a l.docker.Manifests. Should my Digests function return both? Or should I just remove my DIgests and use Instances?

@@ -124,6 +128,15 @@ func (r *Runtime) newLayerTreeFromData(images []*Image, layers []storage.Layer)
topLayer := img.TopLayer()
if topLayer == "" {
tree.emptyImages = append(tree.emptyImages, img)
if manifestList, _ := img.IsManifestList(ctx); manifestList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment why we're doing this here.

Currently when we run podman image prune or podman images --filter
dangling

It is pruning images that are in a local manifest. These images are
not dangling because they are currently in use by a named manifest list.

You can create this situation simply by doing

echo "from scratch" > /tmp/Containerfile
id=$(podman build /tmp)
podman manifest create test $id
podman image prune --force
podman image exists $id

Will return an error since the image was pruned.  Now the local manifest
test is broken.

Signed-off-by: Daniel J Walsh <[email protected]>
@rhatdan
Copy link
Member Author

rhatdan commented Mar 13, 2025

@vrothberg added comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants