-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OBSOLETE] Add an OCI Image Layout command to the ociutil tool, and a corresponding Bazel rule. #58
Conversation
0333db3
to
1f2aaaf
Compare
The spec at https://github.com/opencontainers/image-spec/blob/main/image-layout.md describes an OCI Image Layout directory structure. This commit updates the ociutil tool with a new command to produce such directories based on an input layout.
fcda066
to
0f02cd2
Compare
This commit creates a bazel rule that produces an OCI Image Format directory based on a provided OCI Image index. The OCI Image Layout is a standardized OCI format described in https://github.com/opencontainers/image-spec/blob/main/image-layout.md.
0f02cd2
to
df23a10
Compare
In order for this rule to work correctly in its current state, the | ||
following flags must be provided to bazel: | ||
--incompatible_strict_action_env=false | ||
--spawn_strategy=local | ||
|
||
The incompatible_strict_action_env flag is required because in order to | ||
access the registry, a credential helper executable (named | ||
docker-credential-<SOMETHING>) must be available so that ocitool can | ||
execute it. The incompatible_strict_action_env flag makes the system | ||
PATH available to bazel rules. | ||
|
||
The spawn_strategy flag must be set to local because currently, | ||
oci_image_index is only declaring the new JSON files it creates as | ||
outputs; it's not declaring any manifests or layers from the images as | ||
outputs. By default, Bazel only permits rules to access specifically | ||
declared outputs of the rule's direct dependencies. In order for this | ||
rule to access the transitive set of outputs of all dependencies, we | ||
must disable bazel's sandboxing by setting spawn_strategy=local. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These requirements are fairly big impediments to using this rule, since users would either have to invoke it in a separate bazel instance or we'd have to investigate using bazel transitions to switch these flags off for those specific targets.
I completely understand why you want oci_image_layout to pull as part of bzl build
, since you want to use it as part of a bazel test rule that scans the local image. However, downloading from the network during the bazel build phase is considered bad practice: this is meant to happen in repository rules, and that's why you're having to jump through hoops when you do it in a build rule. Having a repository rule (say oci_pull_image_layout) would make sense from that perspective, but would make usage fairly awkward, and you would not be able to depend on oci_push
targets.
The more I look at this, the more I think that it's appropriate to model this as something that is executed not as part of a bzl test
dependency, but as a bzl run
target i.e. you want something that:
- can refer to the output of the oci_push rule, so that can be fed to the...
- ...call of the oci_image_layout tool, which can use it to prime the download of the entire image (which you could point at a temporary location, and make it smart enough to avoid downloading things if they were already there), which is then fed to...
- ...the analysis tools.
This means that instead of saying bzl test <target-that-depends-on-oci-image-layout>
you'd have to use a special target that you invoked with bzl run
; this is quite a specialized usage, so that should be okay? Certainly, in its current form, a test that uses the rule as it stands will fail unless flags are specified that violate hermeticity, which means you can't run it as part of a wildcard bzl test //...
.
I'm sorry not to be able to approve this immediately; does any of the above sound reasonable, or are these solutions you've already considered and dismissed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further thoughts:
- The requirement here is to make the entire built image available locally for security scanning
- The built image consists of:
- the base image (which
oci_pull
currently elides, replacing it with a reference) - the layers on top of the base image (which are built locally by
rules_oci
)
- the base image (which
- Thus, to create a complete local image, we need a variant of the
oci_pull
repository rule that pulls all the remote layers; this could then be referred to in the provider output of theoci_image
rule, and used byoci_image_layout
.
This moves all network access back into repository rules, without too much awkwardness (since you already have an oci_pull
for the base image).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the thorough review; I really appreciate it. Since I'm doing this to create a test rule that depends on it, I'd very much like the test target to be hermetically executable with bzl test
. I'd also like to avoid pushing the image to a registry first; the idea being that images should be able to pass this test before we push them.
Your second idea seems very doable to me; create a variant of oci_pull that pulls down all necessary base blobs. The oci_image_layout would then be able to access the output of the oci_pull. It shouldn't be too difficult to make providers in oci_image_layout that access the oci_pull outputs. This seems a very clean way of doing it that matches the way bezel repository rules are intended to be used.
One question that arises is, with this approach, should I create a new rule (e.g. oci_pull_all_local or some similar name), or modify the attributes passed to oci_pull to parameterize it. e.g. have a boolean that describes whether to pull all the blobs locally. I'd be interested to hear your thoughts on this.
I'm also not completely clear yet how an oci_image_format rule that consumed an oci_image_index could declare the necessary oci_pull targets, because the base image details are part of oci_image, but don't seem to be outputs of oci_image_index. I'm thinking we might need to expand the outputs of oci_image_index in this case so that it contains details of the base images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question that arises is, with this approach, should I create a new rule (e.g. oci_pull_all_local or some similar name), or modify the attributes passed to oci_pull to parameterize it. e.g. have a boolean that describes whether to pull all the blobs locally. I'd be interested to hear your thoughts on this.
I think we should have a separate repository_rule
, oci_pull_all
. That should ensure that the expense of downloading all the layers is only borne by the test; building/testing other targets should not trigger the download. You can DRY things up by having an oci_pull
macro with a flag that determines whether to invoke oci_pull_all
in addition to the oci_pull
rule (although that's probably icing on the cake).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circling back on this, if we create an oci_pull_all rule (or have a parameterized macro with a boolean that specifies to do pull_all as well as pull), we would need a way for oci_image (and transitively, oci_image_index) to tell it to pull all. I'm thinking that we would need to add a new (optional) param to oci_image and oci_image index that would be passed down the dependency tree and eventually end up being passed to oci_pull (or used to decide to call oci_pull_all). Does this sound right to you?
It would also mean that any oci_image_index targets that we want to run my new test rule on would need to have this new parameter set. Ideally, the test would somehow pass it in, but other uses of oci_image_index would not do so. Unfortunately, I can't think off the top of my head how to achieve that. Any suggestions welcome :).
If what I've suggested sounds sane to you, I'll look at implementing it this week. Thanks again for your help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: I think the only place you need an extra parameter is in oci_image
; everything else can be handled by passing the extra information through via providers.
Could we:
- make oci_image take a new, optional
pulled_base
attr, that would be used to pass in the target created byoci_pull_all
- pass that information out in a new provider, say
OCIPulledImageInfo
, that would only be present ifpulled_base
was set - oci_image_index (and downstream rules) could then look for that provider on its manifests inputs and pass it on (
OCIPulledImageInfo
would have to be more than just a simple pulled_base to handle the multiplepulled_base
s on its input; you'd need to consider what information is needed by your newoci_image_layout
rule, and make it trivially available in the new provider)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, and that sounds like a good approach. It will mean that oci_image_layout won't work unless all the base images that it transitively depends upon have pulled_base set... but I can't think of any way around that with the way bazel works.
Interestingly, Google's oci_pull (https://github.com/bazel-contrib/rules_oci/blob/main/oci/private/pull.bzl) seems to do want we want for this; it even keeps it all in OCI Image Layout format. They have a todo in there to avoid eager download of layers, but the current code downloads everything. I wonder if we could reuse it. We can discuss that in the meeting I've organized.
The spec at https://github.com/opencontainers/image-spec/blob/main/image-layout.md describes an OCI Image Layout directory structure. This PR updates the ociutil tool with a new command to produce such directories based on an input oci_image_layout. It adds an oci_image_format Bazel rule that can be used to produce such directories.