-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PEP 694: Abstract file upload mechanisms #4431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I spent some time to sketch this out as a Finite State Machine in pypi/warehouse#18174, and will be using what I learned to refine this a bit! |
I'm planning on taking a closer look later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments, but overall the changes look good to me.
I just wanted to note a few things I came across when reading the entire PEP (with these changes incorporated-- and probably a number of these came from my original PEP so that's on me :D ).
- The PEP specifies that the endpoint for PyPI will be
https://upload.pypi.org/2.0
. I would probably remove that from the PEP and let PyPI decide what it's endpoint will be (in particular the/2.0
part is somewhat confusing given the PEP also uses conneg). - The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.
- The content type handling is kind of wonky I think. Much like PEP 691 the client can use the
Accept
header to request a particular content type from the server, and the server includes the full version number in themeta.api-version
key in the response. However, the requests appear to only be using themeta.api-version
key. I think ideally we want to have requests using a correctContent-Type
for their request (and thelatest
wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that themeta.api-version
matches theContent-Type
for major version.
While this PEP would no longer directly implement resumable or parallel uploads, it does solve the problem of how to address them. Individual file-upload sessions may occur in parallel if a server chooses to implement a mechanism that can support it, and similarly resumable uploads can be implemented as a mechanism. I'll clarify it in the "The new upload API...", 05d7fc2 |
Realization while specifying |
See: 924a27d |
See: f8469cf |
Co-authored-by: Donald Stufft <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bouncing between reading the PR preview and the diff. I'll capture some thoughts here based on text not touched in the PR (which I don't think GH gives me the UI to add inline comments to).
- Instead of "a standard API" I think we're now talking about "an extensible API" with some standard behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've pretty much run out of gas. I think some of my comments may not make a ton of sense since I reviewed it sequentially rather than reading the whole thing and then composing my feedback. Apologies for that.
Overall, I think this is a really good simplification, and I really like the direction it's going in. I know I have a lot of musings, feedback, thoughts, comments, and suggestions sprinkled throughout, and I hope they're moderately helpful.
If it would be helpful, I can try to edit the PR locally and push changes, or I could branch your PR and push a new branch/PR, or we can just try to make it all work here. Happy to also chat about it separately!
Alternative to inline suggestions:
|
Could you add some sentences about forward compatibility? I'm thinking e.g. about declaring all value enums and dicts as non-exhaustive, so we can add new info both on the server and client side without it being a breaking change. This allows let's say adding variant information (from the WheelNext project) to the
Is it intentional that this means that we can't have an Upload API 2.1 that requires servers that say they're version 2.1 to implement let's say an S3 endpoint? This would be compatible with 2.0 clients since they can still use regular application/octet-stream POST.
As a client implementer, it would be very valuable to have a standard way to pass a token. This wouldn't be about how to obtain the token or its lifetime and semantic, and it would also leave room for non-token auth, but it would solve the current user confusion about different servers using different usernames for their API tokens (we regularly get questions about that in uv). For the session status, do I read it correctly that if the server says an upload is pending, the client has to poll until it is completed, there is no mechanism intended for subscribing to server updates? nit: Instead of |
I'm uncertain about this. I believe that the specific example provided is addressed by the fact that variants would already be encoded into the filename, and I assume (though I'm not fully up to date) in the METADATA file of the wheel. Additional statuses that implementors might need would generally be part of their File Upload Mechanism, which can already document and include its own methods for retrieving detailed status in its responses.
I consider this intentional. Requiring the addition of a mechanism is a large enough change that implementors will have real work to do. I don't forsee that we will ever require mechanisms that can't be "baked in" to an arbitrary server. Requiring S3-compatible protocols of arbitrary servers... no. While requiring implementation of a standard like tus is likely for a future major version.
Understandable, but ultimately I think that enforcing authentication methods through PEP would likely cause trouble . With PEP authoring hat on, I think it would hamstring this PEPs progress. With my PyPI maintainer hat on, I think it would unnecessarily impede future progress on evolving PyPI's authentication mechanisms, Macaroons are already showing some unforeseen consequences and we desperately need to address authentication for more general APIs.
I think that is reasonable, though I don't specifically prefer one option to the other. |
Ope, missed this. Correct there is no method for subscription in this model. |
To me, a lack of auth is a major hurdle to writing a client with Upload API 2.0: Without a way to pass any sorts of credentials, it's impossible to write a client that works universally the way |
Again, this is very understandable, but it is likely to mean that this PEP would bake in HTTP Basic Auth with |
This attempts to defer the implementation details of getting the bits of a given artifact from the client to the server.
The primary motivation here is to decouple this PEP from resumable/multi-part uploads, provide flexibility to implementers of PEP 694, and allow for new upload mechanisms without a PEP cycle.
📚 Documentation preview 📚: https://pep-previews--4431.org.readthedocs.build/