-
-
Notifications
You must be signed in to change notification settings - Fork 331
Support async FSMap objects in zarr.open #2774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I recall there being issues with fsmap before, but I confess I don't really know what an fsmap is -- can someone explain what an fsmap is, how it differs from an fsspec filesystem, and why people would use one over the other? I have a vague feeling that it could be useful to have a |
FSMap was created specifically for the needs of Zarr, and it could have been essentially the same as the v2 FSStore, but was much quicker to get out and working within dask/fsspec. FSMap is a dict-compatible interface (mutable-mapping) on top of a FS instance, which zarr worked with since forever and ignores some FS functionality like the file-like API. To make it work with v3 might be complex, because zarr makes async calls, and the FSMap interface is blocking, even if the underlying FS is async. That means that there will be sync() within sync(), which might still work as zarr maintains its own event loop in a thread separate from fsspec's. |
To be clear: this PR does not use the mapper, but constructs a normal store from the mapper's attributes. I support this path. |
thanks for the questions and answers, Davis and Martin! IIUC there are three cases that we'd need to account for:
|
The instance has all the arguments it was made with as attributes, so you can make a new instance with asynchronous=True from those.
Is there a reason to bother doing this? |
I was thinking based on https://filesystem-spec.readthedocs.io/en/latest/async.html#limitations that LocalStore may be faster since it was designed around providing an async interface. |
IMO it's simpler if the path is always fsmap -> fsspecstore |
I doubt it. The disk is not really async (at least with the standard syscalls python uses), so none of the async code should make any difference for local reads at all. |
I ran into #2808 when trying to wrap sync filesystems. My preference for this PR would be to bring the codecov up to 100% and request that this be considered for review, with the plan that the conversion of sync instances to async and the wrapping of sync filesystems could be handled separately. My reasoning is that some people could use this current implementation and I'm not sure how quickly I'll be able to find time to add the other missing features. Update: I added conversion of sync instances to async |
I think this is ready for a final review, as I addressed all the comments and increased the test coverage. Would you have time for a review before the next release @d-v-b @martindurant? Thanks for your support with the PR |
Co-authored-by: Davis Bennett <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks max, this looks great!
@d-v-b what was your fix the last time you ran into the upstream fsspec <-> s3fs incompatibility issue? |
note that the problem went away without me doing anything, presumably due to some activity over in fsspec? |
Interesting, I don't even see the commit hash from the attempted install in the fsspec main branch's log. I also don't have permissions to rerun the failing job, so I'll just leave this PR as complete despite the failing test. |
@martindurant thanks for your earlier review of this PR. I just wanted to check if you want to take another look before it get merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor questions about imports and re-assignment of object attributes, but otherwise this looks good
import fsspec | ||
from packaging.version import parse as parse_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we import these here?
return fs | ||
if fs.async_impl: | ||
# Convert sync instance of an async fs to an async instance | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we declare this import at the top of the file?
FsspecStore | ||
""" | ||
if not fs_map.fs.async_impl or not fs_map.fs.asynchronous: | ||
fs_map.fs = _make_async(fs_map.fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid mutating the fs_map
object here? It looks like we only need the fs
and root
attributes, so lets pull those off into their own variables.
Addresses #2706
TODO:
docs/user-guide/*.rst
changes/