Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When crates.io gives 429, cargo should back off and retry later #13530

Open
ijackson opened this issue Mar 4, 2024 · 7 comments
Open

When crates.io gives 429, cargo should back off and retry later #13530

ijackson opened this issue Mar 4, 2024 · 7 comments
Labels
A-interacts-with-crates.io Area: interaction with registries A-networking Area: networking issues, curl, etc. A-registries Area: registries C-bug Category: bug Command-publish S-needs-team-input Status: Needs input from team on whether/how to proceed.

Comments

@ijackson
Copy link
Contributor

ijackson commented Mar 4, 2024

Problem

Our workspace contains 46 cargo packages. (Because cargo insists that each crate must be a separate package, and we want to split up crates for code sanity and compilation time reasons.)

This means that in our recent release, our on-duty release technician hit the rate limit. This aborted publication of the workspace, requiring manual retries and wrangling.

Steps

Have a workspace with more than 30 (the current burst rate limit) crates. Try to publish it by publising each crate, in topo order, with cargo publish (using some automated tool).

Possible Solution(s)

cargo should handle a 429 response by backing off and retrying, using an exponential backoff algorithm.

In rust-lang/crates.io#1643 the crates.io team report already having raised the rate limit. In the error message from crates.io they suggest emailing help@ to ask for a rate limit increase. Such a workflow is IMO undesirable, especially as Rust gets more adoption.

Notes

I don't think increasing the rate limit (globally, or on request) is the right fix. If 429 is a hard error there is a tension between preventing misuse, and not breaking large projects' releases. But this tension can be abolished by handling 429 gracefully.

#13397 would probably have assisted the recovery from this situation (and also the local disk space problem our releasae technician also ran into).

See also: rust-lang/crates.io#3229 (requesting docs) #6714 (requesting better error message display).

Version

> cargo version --verbose                                                                                                                                                                                                                                                                             16:57:11
cargo 1.76.0 (c84b36747 2024-01-18)
release: 1.76.0
commit-hash: c84b367471a2db61d2c2c6aab605b14130b8a31b
commit-date: 2024-01-18
host: x86_64-unknown-linux-gnu
libgit2: 1.7.1 (sys:0.18.1 vendored)
libcurl: 8.5.0-DEV (sys:0.4.70+curl-8.5.0 vendored ssl:OpenSSL/1.1.1w)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Arch Linux Rolling Release [64-bit]

(edited to fix ticket links)

@ijackson ijackson added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Mar 4, 2024
@epage epage added A-registries Area: registries A-networking Area: networking issues, curl, etc. Command-publish labels Mar 4, 2024
@epage
Copy link
Contributor

epage commented Mar 4, 2024

People will be more likely to hit this with #1169 (since we'd likely move forward on that without the batch publish on crates.io's side)

cargo release tried to detect rate limitation situations and warn users about them so they can break down the publish into smaller steps.

As for strategies to deal with this, I'd want input from crates.io to know what fits with their intent of the rate limit.

Ideas brought up

. (Because cargo insists that each crate must be a separate package, and we want to split up crates for code sanity and compilation time reasons.)

Technically, packages can contain multiple crates but only one lib crate. See rust-lang/rfcs#3452 for a proposal for a way to explicitly vendor dependencies on publish.

@epage epage added the A-interacts-with-crates.io Area: interaction with registries label Mar 4, 2024
@ijackson
Copy link
Contributor Author

ijackson commented Mar 4, 2024

Ideas brought up

* Back off and retry

* [Batch uploading](https://github.com/rust-lang/crates.io/issues/1643#issuecomment-1120665466)

👍

ISTM that batch uploading is nontrivial. Not only is it a substantial protocol change, but it possibly adds coherency demands to the crates.io system, which may be difficult to fulfil in an ACID way.

I'm guessing that a backoff and retry strategy is likely to be relatively simple. The only question is whether to apply it only to publish (where we know that we want rate limits low enough that reasonable non-abusive use cases can reach them), or all operations.

I think applying it to all operations risks exacerbating operational problems from wayward automation. I don't know if we have non-abusive operations which risk hitting rate limits. (Last week I ran cargo owner add for the same 46 crates and that went smoothly.)

Retrying on 429 only on publish is a conservative choice which would solve the real-world operational problem.

@Eh2406
Copy link
Contributor

Eh2406 commented Mar 5, 2024

Retrying on 429 only on publish is a conservative choice which would solve the real-world operational problem.

That critically depends on what the rate limit is intended to accomplish. If the point of the rate limit is to make sure there is a personal connection between crates.io and it's power users, than any automated fix is just circumventing. Similarly if the expensive part of the operation is receiving and processing the publisher request, then a acceptable retry strategy is just automating the DDOS they were trying to avoid. We should talk to the crates.io team before making technical changes.

It could be that the best compromise here is that cargo has a retry strategy that is ridiculously slow. For example it gets a 429, and prints out a message saying "you're being rate limited please talk to the registry about acceptable use in the future, but for now we are going retry your request After a one minute delay." this reduces the chance of a user intentionally relying on this behaviour, because it's so painfully slow, but also it does not break the automation that assumed that when "cargo publish" finished the crate was published.

@epage epage added S-needs-team-input Status: Needs input from team on whether/how to proceed. and removed S-triage Status: This issue is waiting on initial triage. labels Mar 7, 2024
@ijackson
Copy link
Contributor Author

(This just happened to me again. We have 55 packages now. It was less troublesome this time round because after the discouraging response to #13397 we wrote a python script to publish idempotently,)

It could be that the best compromise here is that cargo has a retry strategy that is ridiculously slow. For example it gets a 429, and prints out a message saying "you're being rate limited please talk to the registry about acceptable use in the future, but for now we are going retry your request After a one minute delay."

This would meet our needs very nicely. Publication of our 55-package workspace takes a fair while in any case.

@ehuss
Copy link
Contributor

ehuss commented Dec 10, 2024

The cargo team discussed this today, but didn't have any specific conclusions. Some notes:

  • It would be good to have more discussion with the crates.io team about what to do here. Rustin volunteered to bring this up.
  • There were concerns over how a delay would work, or what would happen if the user tried to cancel it. The delays can be substantial (1 minute, 10 minutes, 24 hours in various cases). Repeatedly hitting the rate limit with a lot of crates could take tens of minutes or hours to finish. There was some desire that the user should contact crates.io to raise their limit instead.
  • @epage mentioned his release tool cargo-release will try to pre-emptively detect the limit, and require an opt-in to circumvent it.
  • We discussed the possibility of crates.io offering an API to query the user's current limit.
    • @ehuss feels this could be useful, though it could be complex since there isn't a single or simple rate limit, and the structure of the rate limits could change over time. Care would need to be exercised to have something that can retain backwards compatibility, while still giving an accurate response to "if I publish X crates, will I be blocked?".
  • Josh mentioned the possibility of not waiting, but instead telling the user what command to run to resume the rest of the uploads when using cargo publish --workspace (I think? not sure if I captured that correctly).
  • Some of us agree that cargo publish multiple packages at once #1169 (atomic publish) would be useful, though @epage had concerns about that in general. (There's also complications with using Heroku among other things.) There are also some variations like supporting staged publishing (see New publish API crates-io-cargo-teams#82) that could support this use case.

@Rustin170506
Copy link
Member

We discussed it in today’s crates.io weekly meeting. First, I raised a few questions:

  1. Why do we have restrictions on publishing new crates?
    Mainly to prevent a large number of spam crates from being created in a short period of time.
  2. Why do we have restrictions on publishing new versions of crates?
    Mainly to prevent users from continuously publishing new versions in a short time. The publish API is very important and costly for crates.io. Additionally, when it only had a git index, the operation cost was high. In the past, there were users who tried to exploit this by publishing new versions continuously to occupy the “Just Updated” position on crates.io's webpage.
  3. Is it possible for crates.io to provide an API that allows users or Cargo to check how much bucket quota they have left?
    crates.io might be hesitant to introduce such an API, as there are already two different levels of rate limit settings for publishing. Using a fixed API to retrieve this information might limit the evolution of crates.io's rate-limiting mechanisms, for example, changing from per user limit to per crate limit. This would likely require modifications to the new API.
  4. Does crates.io care about whether Cargo does automatic retries or backoff?
    crates.io itself doesn’t particularly care whether the retries are performed by real users or Cargo. However, the current refill interval for the limit budget might be quite long, and having Cargo wait for such a long time might not be a good UI/UX design. But crates.io already has a retry header, so in theory, automatic retries by Cargo could be implemented.

During these discussions, crates.io also proposed two potential solutions:

  1. For new version publishing, the current restriction is per user, which is why we encounter issues with workspace publishing failures. crates.io might consider changing the restriction from per user to per crate for new versions. This should solve most of the problems.
  2. crates.io could consider not only returning the retry interval in the response but also include the remaining budget from the last successful publish request. This would help clients decide whether the next crate can still be published. However, since this is a per-user limit, this value might not always be valid. For example, if a user is publishing different workspaces at the same time, it might not be fully accurate.

@Turbo87
Copy link
Member

Turbo87 commented Dec 14, 2024

We should talk to the crates.io team

probably easiest to ping @rust-lang/crates-io on such issues. I hadn't seen this issue at all before Rustin mentioned it in our team call 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interacts-with-crates.io Area: interaction with registries A-networking Area: networking issues, curl, etc. A-registries Area: registries C-bug Category: bug Command-publish S-needs-team-input Status: Needs input from team on whether/how to proceed.
Projects
Archived in project
Development

No branches or pull requests

6 participants