-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fsid implementation #698
Closed
+11
−0
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jwerbin , glad to see that someone else has a use case for this 🙂 What do you use it for, btw?
Also, for s3, fsid should not be just "s3" (maybe you are looking for fs.protocol if this was enough?). It should instead take into account endpoint as well. The idea for fsid is to be able to identify the filesystem across different runs and s3fs is really determined by the endpoint used (at the very least). fsspec/filesystem_spec#773
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop
Sorry to say that I don't have specific use case for this but I like it as an idea.
I got tripped up because I have a use case for the S3Filesystem where I wanted to wrap all the methods of an instance.
When I do the object introspection I ended up triggering S3FileSystem.fsid().
Which threw. So I figured I would come implement it.
Happy to add the endpoint.
I am not 100% sure on how to add it correctly.
Tell me if this seems reasonable.
if S3FileSystem._s3 object exists we should be able to get the endpoint as follows
endpoint = repr(self._s3._client._endpoint)
If none then either return "" or "Session not set"
Then return something like
"s3" + (f"::{endpoint}" if endpoint else ""
For your specified use in DVC is that enough to capture the specific location?
Or I should it just be forced to try to connect when trying to access the fsid?
Will this create some odd behavior as the object connects and disconnects over its' lifetime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwerbin Not all methods should be implemented though. You will find multiple cases where filesystems don't implement some methods or properties, it is normal.
Looks like you don't need to implement
fsid
, but rather need to account for NotImplemented error in your code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop Ah this might be my confusion on conventions. I thought methods that raise NotImplemented errors in a base class (
AbstractFileSystem
) was a pythonic way to indicate that all derived classes should implement this method.I am happy to proceed if you think the fsid approach that I proposed for the s3fs seems reasonable to you and is helpful for the community.
"s3" + (f"::{endpoint}" if endpoint else ""
I have already built a work around into my code in the near term.
So I can drop it if you don't think it will add value to s3fs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwerbin It should be a little bit more involved then just adding the endpoint (per spec discussed in the original ticket above). Since you don't really need it, let's just close this PR then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like:
should be enough.
My point was that if you don't need it - probably no reason to implement it right now, because it means that it will be dead code for some time. If you are keen on contributing anyways, feel free to reopen a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For s3 specifically, I would suggest that whether an instance is anonymous or not, and maybe the value of profile, if given, are also contenders for being included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant But that goes against the idea of fssid of being able to identify the fs regardless of creds. If the endpoint is the same, it shouldn't matter if you are
alice
orbob
,fsid
should be the same.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on anecdotal usage: users sometimes do have different instances around for s3 with different profiles (say one for reading from a restricted bucket and another for writing elsewhere) even with the same end-point. Similarly, some find it useful to have an anon instance around for checking how what they have written works for non-authenticated users. This latter case is common for kerchunk users, where the data archive being scanned is public, but the references that point at it need to be written to s3 and made public (or requester-pays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant I see your point about the use case. It seems like it is a little different from fsid though. To summarize
Use cases for 2 actually need to be creds agnostic, because multiple people can be working on the same project that uses the same storage but each user has its own creds. I think we've touched on this during fsid discussions before (e.g. fsspec/filesystem_spec#773 (comment)). Maybe you are talking about something like
session_id
described in the last link?