support for fsspec.generic writes #748

mukhery · 2023-06-11T23:10:26Z

@martindurant we recently addressed reading from s3, so you can now use _cp_file to copy files from s3 to the local fs (or to another fs that supports writes), but we can't write to s3 using this function.

I believe the issue is that S3AsyncStreamedFile doesn't implement the write function, so it defaults to the parent class write. The parent class write then fails because self.closed is not handled properly.

Do you know why write isn't implemented? Can we just use _pipe_file something like this:

    async def write(self, data):
        if self.mode not in {"wb"}:
            raise ValueError("File not in write mode")
        await self.fs._pipe_file(self.path, data)
        if self.size is None:
            self.size = len(data)
        else:
            self.size += len(data)
        self.loc += len(data)
        return len(data)

Maybe check that put_object returns a 200, add and manage a chunksize setting, buffer the data and do a single _pipe_file or turn everything into a multipart_upload? What do you think? I can put a PR together if it would help. I'm interested in getting the fsspec.generic cp and rsync functionality fully supported for s3.

The text was updated successfully, but these errors were encountered:

martindurant · 2023-06-12T00:38:08Z

When starting this, I had hoped that actual streaming uploads were possible. However, aiohttp and derivatives use a pull model on upload rather than push: the caller must supply a file-like which supports async reading (as opposed to an async write method we can call). It may be possible to work with this model, but it would be painful.

The other option is essentially to replicate the logic in the blocking code, and expose the async calls there (in initiare_upload, fetch_range and upload_chunk) as - which would limit the sizes of the individual calls as allowed by S3 rules.

pipe_file would not be useful on its own, since it is a call to send a whole file in one go. It may be async, but there is no way to pause mid-way to read more data nor a way to repeatedly call it for the same file (it would overwrite each time).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for fsspec.generic writes #748

support for fsspec.generic writes #748

mukhery commented Jun 11, 2023

martindurant commented Jun 12, 2023

support for fsspec.generic writes #748

support for fsspec.generic writes #748

Comments

mukhery commented Jun 11, 2023

martindurant commented Jun 12, 2023