Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix get_dataset_path() in fs_utils.py #663

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions petastorm/fs_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import pyarrow
import six
from six.moves.urllib.parse import urlparse
import platform

from petastorm.gcsfs_helpers.gcsfs_wrapper import GCSFSWrapper
from petastorm.hdfs.namenode import HdfsNamenodeResolver, HdfsConnector
Expand All @@ -33,6 +34,9 @@
# s3/gs/gcs filesystem expects paths of the form `bucket/path`
return parsed_url.netloc + parsed_url.path

if parsed_url.scheme.lower() in ['file'] and "Windows" in platform.system():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could add a unit test verifying correctness of this function in petastorm/tests/test_fs_utils.py. You would need to mock the value returned by platform.system() and then I think the test logic should be straight forward.

If for some reason you can not do this, we'd be able to land this and I can help with the test later.

return parsed_url.path[1:]

Check warning on line 38 in petastorm/fs_utils.py

View check run for this annotation

Codecov / codecov/patch

petastorm/fs_utils.py#L38

Added line #L38 was not covered by tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a chance that this fix would break for file urls on network location (as per this article). Consider the example of the following WIndows url: file://hostname/path/to/the%20file.txt


return parsed_url.path


Expand Down