Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote downloader caches requests without digests #23932

Open
mislavmandaricaxilis opened this issue Oct 10, 2024 · 3 comments
Open

Remote downloader caches requests without digests #23932

mislavmandaricaxilis opened this issue Oct 10, 2024 · 3 comments
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged

Comments

@mislavmandaricaxilis
Copy link

Description of the bug:

We use rules_oci to build OCI images. After some time we noticed our builds are failing when using experimental_remote_downloader and after some digging we found out what the problem is.

When downloading manifests, rules_oci first fetches a temporary token here. This token is then used for fetching the manifest here. If we configure bazel to use experimental_remote_downloader, all download requests will be routed through the remote asset api, which will cache the response for all requests and return subsequent responses from its cache. There is also an additional flag which we do use called experimental_remote_downloader_local_fallback, which will retry the request with the local downloader when any error is returned from the remote downloader.

What happens is the following:

  • Request for getting a token is successfully executed by the remote downloader and is cached, effectively making the same token being reused by all subsequent requests
  • Request for getting the manifest uses the cached token returned by the remote downloader
  • Since the token is usually short lived, requests for downloading the manifest start failing fairly quickly with 401 Unauthorized
  • These requests are retried locally because of the experimental_remote_downloader_local_fallback flag, but the request that gets retried also uses the cached token, which means it also fails
  • Whole build fails
  • Disabling experimental_remote_downloader makes build pass again

When fetching a token, rules_oci doesn't set the digest, with the expectation that it will not be cached as it is documented in DownloadManager:

If the checksum and path to the repository cache is specified, attempt to load the file from the {@link RepositoryCache}. If it doesn't exist, proceed to download the file and load it into the cache prior to returning the value.

The unfortunate thing is that this only applies to the repository cache, not to the remote downloader. So even through omitting the digest will skip the local repository cache, it will not instruct the remote downloader to also not cache the request, resulting in an unexpected behaviour - repeating the same request to get the token will always result in returning the old cached token from the remote downloader.

Proposed solution:
Looking at the grpc downloader implementation of the remote downloader, it doesn't seem like it's doing anything with digests. So basically it's irrelevant if digests are used or not, all remote calls which are not cached by the local repository cache, land on the remote downloader api, which will cache them.
There is a field in the remote asset fetch api which allows callers to mark content as stale, but apparently bazel's grpc downloader is not setting it.

The proposal is to set oldest_content_accepted to time.Now() if digest is not passed into the downloader. If you'd agree with this proposal, I can open a PR with this change.

Which category does this issue belong to?

External Dependency

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

MODULE.bazel

bazel_dep(name = "rules_oci", version = "2.0.0")
oci = use_extension("@rules_oci//oci:extensions.bzl", "oci")
oci.pull(
    name = "debian12_slim",
    digest = "sha256:804194b909ef23fb995d9412c9378fb3505fe2427b70f3cc425339e48a828fca",
    image = "debian",
    platforms = [
        "linux/amd64",
        "linux/arm64/v8",
    ],
)
use_repo(
    oci,
    "debian12_slim",
    "debian12_slim_linux_amd64",
    "debian12_slim_linux_arm64_v8",
)

.bazelrc

common --experimental_remote_downloader=grpc://buildbarn-remote-asset-api
common --experimental_remote_downloader_local_fallback

BUILD.bazel

oci_image(
    name = name,
    base = "@debian12_slim",
    visibility = ["//visibility:public"],
)

Steps:

  1. bazel build //... -> success
  2. Wait for temporary token to expire (I think 1h is a default)
  3. bazel build //... -> success
  4. Change digest for debian12_slim
  5. bazel build //... -> failure

Which operating system are you running Bazel on?

macOS and Linux

What is the output of bazel info release?

release 7.3.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

Link to Slack thread discussing this with @thesayyn, maintainer of rules_oci from Aspect: https://bazelbuild.slack.com/archives/C04281DTLH0/p1724410800783819

@github-actions github-actions bot added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Oct 10, 2024
@fmeum
Copy link
Collaborator

fmeum commented Oct 10, 2024

That's a very reasonable suggestion (thanks for providing all the context!). The gRPC remote downloader is effectively ownerless right now, but I can pre-review a PR.

@mislavmandaricaxilis
Copy link
Author

Awesome! I'll open a draft PR sometime next week and we can discuss it in more detail there if needed.

@mislavmandaricaxilis
Copy link
Author

@fmeum here is a draft PR for this issue: #23970

When its OK with you, I'll remove it from draft so it can trigger all the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged
Projects
None yet
Development

No branches or pull requests

5 participants