You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@martindurant we recently addressed reading from s3, so you can now use _cp_file to copy files from s3 to the local fs (or to another fs that supports writes), but we can't write to s3 using this function.
I believe the issue is that S3AsyncStreamedFile doesn't implement the write function, so it defaults to the parent class write. The parent class write then fails because self.closed is not handled properly.
Do you know why write isn't implemented? Can we just use _pipe_file something like this:
asyncdefwrite(self, data):
ifself.modenotin {"wb"}:
raiseValueError("File not in write mode")
awaitself.fs._pipe_file(self.path, data)
ifself.sizeisNone:
self.size=len(data)
else:
self.size+=len(data)
self.loc+=len(data)
returnlen(data)
Maybe check that put_object returns a 200, add and manage a chunksize setting, buffer the data and do a single _pipe_file or turn everything into a multipart_upload? What do you think? I can put a PR together if it would help. I'm interested in getting the fsspec.generic cp and rsync functionality fully supported for s3.
The text was updated successfully, but these errors were encountered:
When starting this, I had hoped that actual streaming uploads were possible. However, aiohttp and derivatives use a pull model on upload rather than push: the caller must supply a file-like which supports async reading (as opposed to an async write method we can call). It may be possible to work with this model, but it would be painful.
The other option is essentially to replicate the logic in the blocking code, and expose the async calls there (in initiare_upload, fetch_range and upload_chunk) as - which would limit the sizes of the individual calls as allowed by S3 rules.
pipe_file would not be useful on its own, since it is a call to send a whole file in one go. It may be async, but there is no way to pause mid-way to read more data nor a way to repeatedly call it for the same file (it would overwrite each time).
@martindurant we recently addressed reading from s3, so you can now use
_cp_file
to copy files from s3 to the local fs (or to another fs that supports writes), but we can't write to s3 using this function.I believe the issue is that
S3AsyncStreamedFile
doesn't implement the write function, so it defaults to the parent classwrite
. The parent classwrite
then fails becauseself.closed
is not handled properly.Do you know why
write
isn't implemented? Can we just use_pipe_file
something like this:Maybe check that
put_object
returns a 200, add and manage a chunksize setting, buffer the data and do a single_pipe_file
or turn everything into amultipart_upload
? What do you think? I can put a PR together if it would help. I'm interested in getting the fsspec.generic cp and rsync functionality fully supported for s3.The text was updated successfully, but these errors were encountered: