Skip to content

PEP 694: Abstract file upload mechanisms #4431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented May 21, 2025

This attempts to defer the implementation details of getting the bits of a given artifact from the client to the server.

The primary motivation here is to decouple this PEP from resumable/multi-part uploads, provide flexibility to implementers of PEP 694, and allow for new upload mechanisms without a PEP cycle.


📚 Documentation preview 📚: https://pep-previews--4431.org.readthedocs.build/

@ewdurbin
Copy link
Member Author

I spent some time to sketch this out as a Finite State Machine in pypi/warehouse#18174, and will be using what I learned to refine this a bit!

@ewdurbin ewdurbin marked this pull request as ready for review May 28, 2025 14:13
@ewdurbin ewdurbin requested review from dstufft and warsaw as code owners May 28, 2025 14:13
@ewdurbin
Copy link
Member Author

@dstufft @warsaw I think this is ready for a proper review.

@warsaw
Copy link
Member

warsaw commented May 28, 2025

I'm planning on taking a closer look later today.

Copy link
Member

@dstufft dstufft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments, but overall the changes look good to me.

I just wanted to note a few things I came across when reading the entire PEP (with these changes incorporated-- and probably a number of these came from my original PEP so that's on me :D ).

  • The PEP specifies that the endpoint for PyPI will be https://upload.pypi.org/2.0. I would probably remove that from the PEP and let PyPI decide what it's endpoint will be (in particular the /2.0 part is somewhat confusing given the PEP also uses conneg).
  • The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.
  • The content type handling is kind of wonky I think. Much like PEP 691 the client can use the Accept header to request a particular content type from the server, and the server includes the full version number in the meta.api-version key in the response. However, the requests appear to only be using the meta.api-version key. I think ideally we want to have requests using a correct Content-Type for their request (and the latest wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that the meta.api-version matches the Content-Type for major version.

@ewdurbin
Copy link
Member Author

ewdurbin commented May 30, 2025

  • The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.

While this PEP would no longer directly implement resumable or parallel uploads, it does solve the problem of how to address them. Individual file-upload sessions may occur in parallel if a server chooses to implement a mechanism that can support it, and similarly resumable uploads can be implemented as a mechanism. I'll clarify it in the "The new upload API...", 05d7fc2

@ewdurbin
Copy link
Member Author

ewdurbin commented May 30, 2025

Realization while specifying http-post-application-octet-stream: we need to support PEP 740 style attestations... tagging @woodruffw for thoughts :) See: 2ef077c

@ewdurbin
Copy link
Member Author

  • The content type handling is kind of wonky I think. Much like PEP 691 the client can use the Accept header to request a particular content type from the server, and the server includes the full version number in the meta.api-version key in the response. However, the requests appear to only be using the meta.api-version key. I think ideally we want to have requests using a correct Content-Type for their request (and the latest wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that the meta.api-version matches the Content-Type for major version.

See: 924a27d

@ewdurbin
Copy link
Member Author

  • The PEP specifies that the endpoint for PyPI will be https://upload.pypi.org/2.0. I would probably remove that from the PEP and let PyPI decide what its endpoint will be (in particular the /2.0 part is somewhat confusing given the PEP also uses conneg).

See: f8469cf

Co-authored-by: Donald Stufft <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bouncing between reading the PR preview and the diff. I'll capture some thoughts here based on text not touched in the PR (which I don't think GH gives me the UI to add inline comments to).

  • Instead of "a standard API" I think we're now talking about "an extensible API" with some standard behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've pretty much run out of gas. I think some of my comments may not make a ton of sense since I reviewed it sequentially rather than reading the whole thing and then composing my feedback. Apologies for that.

Overall, I think this is a really good simplification, and I really like the direction it's going in. I know I have a lot of musings, feedback, thoughts, comments, and suggestions sprinkled throughout, and I hope they're moderately helpful.

If it would be helpful, I can try to edit the PR locally and push changes, or I could branch your PR and push a new branch/PR, or we can just try to make it all work here. Happy to also chat about it separately!

@warsaw
Copy link
Member

warsaw commented Jun 10, 2025

While this PEP would no longer directly implement resumable or parallel uploads, it does solve the problem of how to address them. Individual file-upload sessions may occur in parallel if a server chooses to implement a mechanism that can support it, and similarly resumable uploads can be implemented as a mechanism. I'll clarify it in the "The new upload API...",

Alternative to inline suggestions:

The new upload API proposed in this PEP provides ways to solve all of these problems, either
directly or through an extensible approach, allowing servers to implement features such as resumable
and parallel uploads.  This PEP proposes better error reporting, a more robust release testing
experience, and atomic and simultaneous publishing of all release artifacts.

@konstin
Copy link
Contributor

konstin commented Jun 11, 2025

Could you add some sentences about forward compatibility? I'm thinking e.g. about declaring all value enums and dicts as non-exhaustive, so we can add new info both on the server and client side without it being a breaking change. This allows let's say adding variant information (from the WheelNext project) to the POST upload field, or having additionally statuses in the future. (This is something I'd expect implementers to do anyway, but spelling it out helps).

New required mechanisms MUST NOT be added [...] without an update to the major version.

Is it intentional that this means that we can't have an Upload API 2.1 that requires servers that say they're version 2.1 to implement let's say an S3 endpoint? This would be compatible with 2.0 clients since they can still use regular application/octet-stream POST.

Upload authentication is also not standardized.

As a client implementer, it would be very valuable to have a standard way to pass a token. This wouldn't be about how to obtain the token or its lifetime and semantic, and it would also leave room for non-token auth, but it would solve the current user confusion about different servers using different usernames for their API tokens (we regularly get questions about that in uv).

For the session status, do I read it correctly that if the server says an upload is pending, the client has to poll until it is completed, there is no mechanism intended for subscribing to server updates?

nit: Instead of valid-for, what about a valid-until with a timestamp?

@ewdurbin ewdurbin requested a review from warsaw June 11, 2025 10:56
@ewdurbin
Copy link
Member Author

Could you add some sentences about forward compatibility? I'm thinking e.g. about declaring all value enums and dicts as non-exhaustive, so we can add new info both on the server and client side without it being a breaking change. This allows let's say adding variant information (from the WheelNext project) to the POST upload field, or having additionally statuses in the future. (This is something I'd expect implementers to do anyway, but spelling it out helps).

I'm uncertain about this. I believe that the specific example provided is addressed by the fact that variants would already be encoded into the filename, and I assume (though I'm not fully up to date) in the METADATA file of the wheel.

Additional statuses that implementors might need would generally be part of their File Upload Mechanism, which can already document and include its own methods for retrieving detailed status in its responses.

New required mechanisms MUST NOT be added [...] without an update to the major version.

Is it intentional that this means that we can't have an Upload API 2.1 that requires servers that say they're version 2.1 to implement let's say an S3 endpoint? This would be compatible with 2.0 clients since they can still use regular application/octet-stream POST.

I consider this intentional. Requiring the addition of a mechanism is a large enough change that implementors will have real work to do. I don't forsee that we will ever require mechanisms that can't be "baked in" to an arbitrary server. Requiring S3-compatible protocols of arbitrary servers... no. While requiring implementation of a standard like tus is likely for a future major version.

Upload authentication is also not standardized.

As a client implementer, it would be very valuable to have a standard way to pass a token. This wouldn't be about how to obtain the token or its lifetime and semantic, and it would also leave room for non-token auth, but it would solve the current user confusion about different servers using different usernames for their API tokens (we regularly get questions about that in uv).

Understandable, but ultimately I think that enforcing authentication methods through PEP would likely cause trouble . With PEP authoring hat on, I think it would hamstring this PEPs progress. With my PyPI maintainer hat on, I think it would unnecessarily impede future progress on evolving PyPI's authentication mechanisms, Macaroons are already showing some unforeseen consequences and we desperately need to address authentication for more general APIs.

For the session status, do I read it correctly that if the server says an upload is pending, the client has to poll until it is completed, there is no mechanism intended for subscribing to server updates?

nit: Instead of valid-for, what about a valid-until with a timestamp?

I think that is reasonable, though I don't specifically prefer one option to the other.

@ewdurbin
Copy link
Member Author

For the session status, do I read it correctly that if the server says an upload is pending, the client has to poll until it is completed, there is no mechanism intended for subscribing to server updates?

Ope, missed this. Correct there is no method for subscription in this model.

@konstin
Copy link
Contributor

konstin commented Jun 11, 2025

Understandable, but ultimately I think that enforcing authentication methods through PEP would likely cause trouble . With PEP authoring hat on, I think it would hamstring this PEPs progress. With my PyPI maintainer hat on, I think it would unnecessarily impede future progress on evolving PyPI's authentication mechanisms, Macaroons are already showing some unforeseen consequences and we desperately need to address authentication for more general APIs.

To me, a lack of auth is a major hurdle to writing a client with Upload API 2.0: Without a way to pass any sorts of credentials, it's impossible to write a client that works universally the way twine and uv publish can currently talk to any registry; You can't write a CLI interface, as we don't know that the CLI arguments would be.

@ewdurbin
Copy link
Member Author

To me, a lack of auth is a major hurdle to writing a client with Upload API 2.0: Without a way to pass any sorts of credentials, it's impossible to write a client that works universally the way twine and uv publish can currently talk to any registry; You can't write a CLI interface, as we don't know that the CLI arguments would be.

Again, this is very understandable, but it is likely to mean that this PEP would bake in HTTP Basic Auth with __token__ username, and a generic "token" in as the password. Maybe this is fine, but I certainly don't prefer it as a constraint on PyPI's implementation long term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants