Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in S3FileSystem._get_file results in corrupted downloads #743

Open
merrysailor opened this issue May 30, 2023 · 1 comment
Open

Bug in S3FileSystem._get_file results in corrupted downloads #743

merrysailor opened this issue May 30, 2023 · 1 comment

Comments

@merrysailor
Copy link

Hi,

There are two bugs in S3FileSystem._get_file. Attaching a patch diff. I replicated the issue and tested the fix by locally modifying the method to randomly raise a retry-able exception.

diff --git a/s3fs/core.py b/s3fs/core.py
index 889fdb0..64c6501 100644
--- a/s3fs/core.py
+++ b/s3fs/core.py
@@ -1165,7 +1165,7 @@ class S3FileSystem(AsyncFileSystem):
                 Bucket=bucket,
                 Key=key,
                 **version_id_kw(version_id or vers),
-                **self.req_kw,
+                **kw,
             )
             return resp["Body"], resp.get("ContentLength", None)

@@ -1196,10 +1196,10 @@ class S3FileSystem(AsyncFileSystem):
                         # in a failure.
                         # Examples:
                         # Read 1 byte -> failure, retry with read_range=0, byte range should be 0-
-                        # Read 1 byte, success. Read 1 byte: failure. Retry with read_range=2, byte-range should be 2-
-                        # Read 1 bytes, success. Read 1 bytes: success. Read 1 byte, failure. Retry with read_range=3,
+                        # Read 1 byte, success. Read 1 byte: failure. Retry with read_range=1, byte-range should be 1-
+                        # Read 1 bytes, success. Read 1 bytes: success. Read 1 byte, failure. Retry with read_range=2,
                         # byte-range should be 3-.
-                        body, _ = await _open_file(bytes_read + 1)
+                        body, _ = await _open_file(bytes_read)
                         continue

                     if not chunk:
@@ -1213,6 +1213,8 @@ class S3FileSystem(AsyncFileSystem):
             except Exception:
                 pass

+        assert bytes_read == content_length
+
     async def _info(self, path, bucket=None, key=None, refresh=False, version_id=None):
         path = self._strip_protocol(path)
         bucket, key, path_version_id = self.split_path(path)
@martindurant
Copy link
Member

martindurant commented Jun 1, 2023

Any chance you can make this into a PR, including the raising of the retriable errors via a mock?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants