Skip to content

Support async FSMap objects in zarr.open #2774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

maxrjones
Copy link
Member

@maxrjones maxrjones commented Jan 28, 2025

Addresses #2706

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Sorry, something went wrong.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 28, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Jan 28, 2025

I recall there being issues with fsmap before, but I confess I don't really know what an fsmap is -- can someone explain what an fsmap is, how it differs from an fsspec filesystem, and why people would use one over the other?

cc @martindurant

I have a vague feeling that it could be useful to have a Store class that wraps generic MutableMapping instances, and maybe fsmaps could go there, but that requires me knowing more about the user context for fsmap.

@martindurant
Copy link
Member

FSMap was created specifically for the needs of Zarr, and it could have been essentially the same as the v2 FSStore, but was much quicker to get out and working within dask/fsspec.

FSMap is a dict-compatible interface (mutable-mapping) on top of a FS instance, which zarr worked with since forever and ignores some FS functionality like the file-like API.

To make it work with v3 might be complex, because zarr makes async calls, and the FSMap interface is blocking, even if the underlying FS is async. That means that there will be sync() within sync(), which might still work as zarr maintains its own event loop in a thread separate from fsspec's.

@martindurant
Copy link
Member

To be clear: this PR does not use the mapper, but constructs a normal store from the mapper's attributes. I support this path.

@maxrjones
Copy link
Member Author

thanks for the questions and answers, Davis and Martin!

IIUC there are three cases that we'd need to account for:

  1. FSMap wraps an async instance of an async-compatible filesystem
    Solution - as implemented in this PR, extract the wrapped filesystem and use it to open an FsspecStore
  2. FSMap wraps a non-async instance of a non-async-compatible filesystem
    Solution option 1 - extract the wrapped filesystem and wrap in AsyncFileSystemWrapper to open an FsspecStore as in Wrap sync fs for xarray.to_zarr #2533
    Solution option 2 - if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore, for other protocols, wrap in AsyncFileSystemWrapper to open an FsspecStore
  3. FSMap wraps a non-async instance of an async-compatibile filesystem
    Solution - this is the case I'm not sure about. Is there a way to convert from a sync instance to an async instance without needing to wrap it in AsyncFileSystemWrapper? @martindurant could you please offer guidance here?

@martindurant
Copy link
Member

Is there a way to convert from a sync instance to an async instance without needing to wrap it in AsyncFileSystemWrapper

The instance has all the arguments it was made with as attributes, so you can make a new instance with asynchronous=True from those.

if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore

Is there a reason to bother doing this?

@maxrjones
Copy link
Member Author

maxrjones commented Jan 28, 2025

if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore

Is there a reason to bother doing this?

I was thinking based on https://filesystem-spec.readthedocs.io/en/latest/async.html#limitations that LocalStore may be faster since it was designed around providing an async interface.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 28, 2025

IMO it's simpler if the path is always fsmap -> fsspecstore

@martindurant
Copy link
Member

LocalStore may be faster since it was designed around providing an async interface.

I doubt it. The disk is not really async (at least with the standard syscalls python uses), so none of the async code should make any difference for local reads at all.

@maxrjones
Copy link
Member Author

maxrjones commented Feb 7, 2025

I ran into #2808 when trying to wrap sync filesystems. My preference for this PR would be to bring the codecov up to 100% and request that this be considered for review, with the plan that the conversion of sync instances to async and the wrapping of sync filesystems could be handled separately. My reasoning is that some people could use this current implementation and I'm not sure how quickly I'll be able to find time to add the other missing features.

Update: I added conversion of sync instances to async

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@maxrjones maxrjones changed the title WIP: Support fsspec mutable mapping objects in zarr.open Support async FSMap objects in zarr.open Feb 13, 2025
@maxrjones maxrjones marked this pull request as ready for review February 13, 2025 01:44
maxrjones and others added 4 commits June 2, 2025 19:59
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jun 3, 2025
@maxrjones maxrjones mentioned this pull request Jun 3, 2025
@maxrjones
Copy link
Member Author

maxrjones commented Jun 3, 2025

I think this is ready for a final review, as I addressed all the comments and increased the test coverage. Would you have time for a review before the next release @d-v-b @martindurant? Thanks for your support with the PR

maxrjones and others added 3 commits June 3, 2025 11:40

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Davis Bennett <[email protected]>
Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks max, this looks great!

@maxrjones
Copy link
Member Author

@d-v-b what was your fix the last time you ran into the upstream fsspec <-> s3fs incompatibility issue?

@d-v-b
Copy link
Contributor

d-v-b commented Jun 3, 2025

see #3091, in that pr @dstansby speculated that one of the git commits in fsspec was tagged, which prevented the version from being indicated as a dev version. Maybe we should re-open that PR...

@d-v-b
Copy link
Contributor

d-v-b commented Jun 3, 2025

note that the problem went away without me doing anything, presumably due to some activity over in fsspec?

@maxrjones
Copy link
Member Author

note that the problem went away without me doing anything, presumably due to some activity over in fsspec?

Interesting, I don't even see the commit hash from the attempted install in the fsspec main branch's log. I also don't have permissions to rerun the failing job, so I'll just leave this PR as complete despite the failing test.

@maxrjones
Copy link
Member Author

@martindurant thanks for your earlier review of this PR. I just wanted to check if you want to take another look before it get merged?

maxrjones and others added 3 commits June 5, 2025 14:58

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor questions about imports and re-assignment of object attributes, but otherwise this looks good

Comment on lines +44 to +45
import fsspec
from packaging.version import parse as parse_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we import these here?

return fs
if fs.async_impl:
# Convert sync instance of an async fs to an async instance
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we declare this import at the top of the file?

FsspecStore
"""
if not fs_map.fs.async_impl or not fs_map.fs.asynchronous:
fs_map.fs = _make_async(fs_map.fs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid mutating the fs_map object here? It looks like we only need the fs and root attributes, so lets pull those off into their own variables.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants