-
Notifications
You must be signed in to change notification settings - Fork 219
Fix unexpected results when filtering images #2413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imageMatchesReferenceFilter
logic LGTM, that makes sense. The design of the upper parts of the call stack needs a bit of cleanup.
3e00335
to
8dd85f6
Compare
libimage/filters.go
Outdated
if lookedUp != nil { | ||
if lookedUp.ID() == img.ID() { | ||
return true, nil | ||
resolvedName, _, _ := strings.Cut(resolvedName, "@") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing?
In the later loop, when we are making various imprecise matches, the return value is the full refString
. This seems to be dropping digests from the returned values. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have no idea what this is doing. AFAICS LookupImage
can easily be a (precise) repo:tag or repo@digest match. And the range refs
loop below always collects the precise ref.String()
values that match. Why is this ignoring the digest part (if that is what it is) and returning all tags (if that is what it is)?
And, even apart from the again, HasPrefix
just looks implausible. Why should example.com/a
match example.com/ab
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only want to return names that match the name if the name@digest
reference format is used.
So, for example, for an image with the following names:
quay.io/libpod/alpine:latest
quay.io/libpod/al:latest
quay.io/libpod/al:tag
I expect the result for filtering with the al@digest
reference, these names:
quay.io/libpod/al:latest
quay.io/libpod/al:tag
Usage HasPrefix
is wrong. I fixed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK… It’s very unclear to me that a filter for reference=name@digest
should return name:tag
(and I’ll not even comment on the ignoreme
thing, I guess that’s what we deserve), but it’s a conversation we can have, or maybe a non-negotiable backward compatibility constraint.
Where do short names, and hostname-less paths, come in? For ordinary tags we have the 1)2)3) candidates for matching, and that doesn’t happen here.
I think it’s a fair point to recognize that if value contains a digest, we probably go through this code path and not the later path.Match
code, so we only truly need to handle digests on this path (is that the case???). But I still think that the two should be as consistent as possible.
Also consider inputs like alpine@digest
matching all four of quay.io/libpod/alpine@digest
, quay.io/libpod/alpine:latest
, example.com/mirror/alpine@digest
, example.com/mirror/alpine@latest
. I don’t know whether the output should contain tags but I think it definitely should contain both host names, because that happens for tags in the 3) case.
IOW, conceptually, I think we should first decide whether the image matches, and then build resolvedNames
, considering each entry of NamesReference
, individually. And if that does need a digest special case, oh well, so be it.
(I don’t immediately know whether the code should be a “match” loop + ”build names” loop, or whether we should preserve the current 2 matching situations, with a “match a name to input” helper. Maybe the latter.)
Uh… what about *pine@digest
? … Let’s pretend I didn’t ask, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also consider inputs like
alpine@digest
matching all four ofquay.io/libpod/alpine@digest
,quay.io/libpod/alpine:latest
,example.com/mirror/alpine@digest
,example.com/mirror/alpine@latest
. I don’t know whether the output should contain tags but I think it definitely should contain both host names, because that happens for tags in the 3) case.
Oh, I hadn't thought of that case at all. I think for the alpine@digest
case, filterReferences()
should return all names and thus keep the original behaviour. I checked the Podman documentation, and the reference
filter accepts the pattern of an image reference <image-name>[:<tag>]
. Is it okay to do that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“all” as in “not filter at all”, or “the ones that match alpine
”?
I don’t know what are the constraints here. Podman documentation is definitely relevant, but also Docker’s API seems to be.
And Docker code, from a very quick check, seems to possibly match the input against repo@digest values. (If a reference.Named
value matches a "reference="
filter, it is assigned to either RepoDigests
or to RepoTags
in the output. That’s for the traditional implementation; the containerd one is doing something else with RepoDigests
, I can’t investigate the details now.) I’d also need to investigate further whether this always includes any exiting digest of an image, or only digests that were used to explicitly pull repo@digest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“all” as in “not filter at all”, or “the ones that match
alpine
”?
Not filtered at all.
f644901
to
c1d4722
Compare
PTAL @mtrmac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Honny1, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
libimage/filters.go
Outdated
if lookedUp != nil { | ||
if lookedUp.ID() == img.ID() { | ||
return true, nil | ||
resolvedName, _, _ := strings.Cut(resolvedName, "@") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have no idea what this is doing. AFAICS LookupImage
can easily be a (precise) repo:tag or repo@digest match. And the range refs
loop below always collects the precise ref.String()
values that match. Why is this ignoring the digest part (if that is what it is) and returning all tags (if that is what it is)?
And, even apart from the again, HasPrefix
just looks implausible. Why should example.com/a
match example.com/ab
?
3d1f37e
to
289f353
Compare
@mtrmac PTAL, I have implemented your feedback. |
73e55be
to
ba9f9e1
Compare
@mtrmac PTAL |
Just to keep this moving a bit — where I struggle is: what do we tell users / document? “ As a first impression, that seems too complex a story. Is there some way of thinking / expressing this that would likely make sense to users? E.g. one possible UI philosophy might be “Deduplication is a purely internal thing that is invisible in semantics. The handling of a specific image, and image name, will always be the same, whether or not the physical image is deduplicated with something else or not. So, filters work on names only” (That would be consistent, and elegant, but the UI exposes image IDs and sometimes motivates users to use them; and I suspect, with all the heuristics in Another might be “deduplication exists and shows up in the UI. The What is the philosophy with the PR as proposed? “digest inputs are invalid here anyway, so we just do something minimally plausible?” |
#2413 (comment) points at the documentation of the |
That’s not what we do now: For an deduplicated image tagged with (A, B), a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I’m reviewing the code under the assumption that “if the input is a repo@digest, we will not filter the names on the output” is the desired design. As discussed above, I’m not at all convinced that’s a reasonable UI.
OTOH, a contrary point: Triangulating between that and Maybe a structure:
But that’s a guess, and probably not worth blindly rushing to implement. What does Docker (traditional, and containerd) do? |
I was comparing Podman and Docker. And it seems, after your arguments, that Podman's behavior is more of an undocumented feature than a bug. I'm not sure that trying to copy Docker's behavior for the Docker only accepts this Containerd uses its own filter syntax, which I am unfortunately unable to find in the documentation. So I was not able to compare the results even with Containerd. Docker example:
|
Podman prints the same list of all image names for all imputes that were given to Docker. |
A lot of the last-resort
I see the examples as accepting wildcards. But Docker does not have the concept of short name search, short names always refer to In this respect, we must always differ somehow. And we have the case of matching against repo without the host name … I can’t tell whether that is an intentional idea or a buggy hack to try to better support similarity to short name search.
(I meant Docker with a containerd backend — there are two fairly different underlying implementations of images in the Docker daemon. https://docs.docker.com/engine/storage/containerd/ is the UI, I think.)
I think the current code is clearly problematic / inconsistent. So far I don’t think we have arrived at a consistent alternative (notably the desire of containers/podman#25725 to, essentially, work on image names, and not on deduplicated images, conflicts with the I think it’s clear enough we should not alter the list of image IDs printed by That would mean giving up for Podman 5… and either documenting things, or… just… saying “it’s complicated, best to use precise image names, we are not making any promises for substrings and wildcards”. And maybe trying for Podman 6, although I don’t know about prioritizing this. @Luap99 |
… Are you saying that Docker has the same behavior that containers/podman#25725 is complaining about? I was assuming that the 5 tags in the samples are point at the same deduplicate image, in which case the test cases do show references being filtered. Is the test using 5 different images? |
No, I meant to say that I also tested Podman with the same inputs I used for the Docker examples. And the result was always the same list of 5 image tags. The test does not use 5 different images. |
I used a fully qualified image name, and Podman listed all the image tags/names.
I think we could say that if a fully qualified image name is used, the output should only show the corresponding names. And in the case of short names, wildcards, and digests. Podman should display all the names of the image. If we agree on this behavior, I'm not sure how to check if the user input is a fully qualified image name, and if we should wait for Podman 6 to make this behavior change. |
That could work. I’d include fully-qualified names with digests in the first case: “if the input 100% unambiguously specifes precisely one image, it is a query; if it is any more ambiguous, it is a search”. But users with such a precise input can use
Then
That’s still a topic — I’m leaning towards Podman 6 but that’s a very weak opinion. |
I've updated the behavior so that the output only shows the corresponding names if a fully qualified image name is used, including I'm not sure about this case:
What should be the correct output of this command?
Displays all names.
Displays all tags that match the image name. (PR implementation)
WDYT? @mtrmac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be the correct output [repo@digest]?
Yeah… I think the PR’s implementation is reasonable. In effect, we have 3 separate behaviors:
- fqRepo:tag = print only that repo:tag
- fqRepo@digest = print all matching tags (if any), and possibly the digest if it is in
image.Names
- anything else (short names, wildcards) = no filtering.
and we’d have to document them as such (with a caveat that this might change in Podman 6??).
Procedurally, if we go one of these routes and don’t give up entirely: could you, split the filters_test.go
refactor that shows matched names as a separate first commit, please? It’s not just cleaner as a matter of formal tidiness, it would mean that a second commit, actually changing behavior and updating the tests, would be explicit and self-documenting about the changes.
namesThatMatch = append(namesThatMatch, name) | ||
} | ||
} | ||
img.setEphemeralNames(namesThatMatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work correctly if wantedReferenceMatches
contains 2 different, both matching, values.
Or, maybe, that should not count as our “precise query” special case? I think that would be surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the right thing to do is. The OR
behavior is used for wantedReferenceMatches
. On the one hand, when you use OR
you don't care about both matches, only the first match. But on the other hand, you would want to display both results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think it is fine to move ahead with this without a 6.0.
There might be a risk here that someone expects multiple names but I don't see why.
Overall since this changes the behaviour it might be best to to make a test PR against podman with this change and make sure all tests still pass first before merging. If it turns out this breaks a lot of our tests than maybe it is not a good change for a minor release
Test PR: containers/podman#26344 |
Signed-off-by: Jan Rodák <[email protected]>
The filtered image contains all Names of image. This causes the podman to list images that are not expected. Fixes: containers/podman#25725 Fixes: https://issues.redhat.com/browse/RUN-2726 Signed-off-by: Jan Rodák <[email protected]>
Code LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code, implementing a single-image filter, LGTM — but, still, #2413 (comment) .
Alternatives:
- Only do the name filtering for the special case of a single filter in
wantedReferenceMatches
(and emptyunwantedReferenceMatches
?). It is a special case anyway, so let’s narrow down the case where we are changing behavior. - Treat each
reference=
separately, and build a set of accepted image names. (I.e. if anything inwantedReferenceMatches
is a wildcard or similarly generic, include all names; if allwantedReferenceMatches
elements are fully-qualified!NameOnly
names, include only matches for those names - Something else?
I think it would be useful to think what the end-user documentation of the behavior will be (write a draft?). We can implement any of these behaviors, but documenting them in a way that can be understood seems harder than that.
Personally, I weakly lean towards the first option (with an explanation similar to #2413 (comment) “it is a query” vs. “it is a search”) — that would change the effect of podman images $singleInput
, but any users doing something more complex with --filter
would not see a change.
BTW I think Podman should eventually get a man page update documenting this.
@@ -25,6 +25,7 @@ type compiledFilters map[string][]filterFunc | |||
|
|||
// Apply the specified filters. All filters of each key must apply. | |||
// tree must be provided if compileImageFilters indicated it is necessary. | |||
// WARNING: Application of referenceFilter sets the image names to matched names, but this only affects the values in memory, they are not written to storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Absolutely non-blocking: referenceFilter
does not exist — but this is clear enough. Two instances.)
@@ -290,15 +292,31 @@ func filterReferences(r *Runtime, wantedReferenceMatches, unwantedReferenceMatch | |||
return !unwantedMatched, nil | |||
} | |||
|
|||
// Go through the wanted matches | |||
// If an image matches the wanted filter but it also matches the unwanted | |||
// filter, don't add it to the output | |||
for _, value := range wantedReferenceMatches { | |||
matches, err := imageMatchesReferenceFilter(r, img, value) | |||
if err != nil { | |||
return false, err | |||
} | |||
if matches && !unwantedMatched { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: It’s a bit untidy to have the success case hidden nested in a loop like this, and interrupting the flow of reading the loop.
If nothing else, a
matchedReference := ""
for _, value := range wantedReferenceMatches {
if /* matches */ { matchedReference = value; break }
}
if matchedReference == "" { return false }
/* the new filtering logic */
would be a tiny bit structurally clearer — but decide what behavior we want for the multiple-wantedReferenceMatches
case first.
(An earlier version of this PR had simplified the unwanted…
matching; it would be nice to get that back while we still remember thinking that through. *shrug* I can do that afterwards.)
// more sense for the user to see only the corresponding names in the output, not all the names of the deduplicated | ||
// image; therefore, we make the corresponding names available to the caller by overwriting the actual image names | ||
// with the corresponding names when the reference filter matches and the reference is a fully qualified image name | ||
// (i.e., contains a tag or digest, not just a bare repository name). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This might eventually need another update, depending on what we decide.)
The filtered image contains all Names of image. This causes the podman to list images that are not expected.
For example:
If an image
box:latest
is taged asexample:latest
and then images are filtered withreference=example
,box:latest
andexample:latest
will be output because the image has multiple names.Fixes: containers/podman#25725
Fixes: https://issues.redhat.com/browse/RUN-2726