-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_groups
for zarr backends
#9469
Conversation
cc @TomNicholas, @flamingbear, @owenlittlejohns, @keewis and @shoyer ! |
roundtrip_dict = open_groups(filepath, engine="zarr") | ||
roundtrip_dt = DataTree.from_dict(roundtrip_dict) | ||
|
||
assert open_datatree(filepath, engine="zarr").identical(roundtrip_dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.identical
is going to be different to check if the coordinates are defined on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically see #9473.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so from looking at this and testing this locally I don't think this will affect these tests.
xarray/backends/zarr.py
Outdated
group_name = str(NodePath(path_group)) | ||
groups_dict[group_name] = group_ds | ||
|
||
return groups_dict | ||
|
||
|
||
def _iter_zarr_groups(root, parent="/"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to add type hints (though I don't know what the type of root
is)
def _iter_zarr_groups(root, parent="/"): | |
def _iter_zarr_groups(root: ZarrGroup?, parent: str = "/") -> Iterable[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what the appropriate TypeHint would be here that wouldn't break mypy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a play with this. I traced root
back and found it is indeed a zarr.Group
, so I think the type hint above works (along with adding from zarr import Group as ZarrGroup
in the if TYPE_CHECKING
block at the top of the file).
The bit that was causing an issue the assignment of a NodePath
object to parent
, which is a str
from the function signature. (Apologies if I'm just playing catch up here from what you guys have been saying all along)
I could make mypy
happy if I did the following in the function:
def _iter_zarr_groups(root: ZarrGroup, parent: str = "/") -> Iterable[str]:
from xarray.core.treenode import NodePath
parent_nodepath = NodePath(parent)
yield str(parent_nodepath)
for path, group in root.groups():
gpath = parent_nodepath / path
yield str(gpath)
yield from _iter_zarr_groups(group, parent=gpath)
Maybe that's an option? (Unless I'm missing a trick with this breaking the recursion?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
(No need for from xarray.core.treenode import NodePath
to live inside the function though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Owen! That did the trick!
with pytest.raises(ValueError): | ||
open_datatree(filepath) | ||
open_datatree(unaligned_datatree_nc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely use
pytest.raises(ValueError, match=...):
to match a much more specific error. We are expecting this test to fail because of an alignment error, not any other type of ValueError
.
(Maybe we should create a new exception AlignmentError(ValueError)
, given that we do already have MergeError(ValueError)
? @shoyer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like this but in the treenode.py
module?
class AlignmentError(ValueError):
"""Raised when a child node has coordinates that are not aligned with its parents."""
If we do this I think we might have to change datatree.py
to raise AlignmentError
in
`_check_alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignment is a core xarray concept separate of datatree. My suggestion was to change the ValueErrors
raised in alignment.py
to become AlignmentErrors
, then datatree.py
would automatically throw AlignmentError
when _check_alignment
is called (because it calls xr.align
internally).
This suggestion is outside the scope of this PR though - you should go ahead with just using match
to match a specific ValueError
and then we can maybe do AlignmentError
in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the last commit 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eni-awowale !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Nice work!
I looked at the type hint stuff, and might have an answer 🤞.
xarray/backends/zarr.py
Outdated
group_name = str(NodePath(path_group)) | ||
groups_dict[group_name] = group_ds | ||
|
||
return groups_dict | ||
|
||
|
||
def _iter_zarr_groups(root, parent="/"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a play with this. I traced root
back and found it is indeed a zarr.Group
, so I think the type hint above works (along with adding from zarr import Group as ZarrGroup
in the if TYPE_CHECKING
block at the top of the file).
The bit that was causing an issue the assignment of a NodePath
object to parent
, which is a str
from the function signature. (Apologies if I'm just playing catch up here from what you guys have been saying all along)
I could make mypy
happy if I did the following in the function:
def _iter_zarr_groups(root: ZarrGroup, parent: str = "/") -> Iterable[str]:
from xarray.core.treenode import NodePath
parent_nodepath = NodePath(parent)
yield str(parent_nodepath)
for path, group in root.groups():
gpath = parent_nodepath / path
yield str(gpath)
yield from _iter_zarr_groups(group, parent=gpath)
Maybe that's an option? (Unless I'm missing a trick with this breaking the recursion?)
subgroup1_var[:] = np.array([[0.1, 0.2]]) | ||
with pytest.raises( | ||
ValueError, | ||
match=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! (I know there's already been a conversation on using the match
kwarg, but wanted to 👍. This is definitely a good way to go to make sure the right ValueError
is being raised.)
+ ".*" | ||
), | ||
): | ||
open_datatree(unaligned_datatree_nc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the order of merging to expect, but if @flamingbear merges #9033 first, then we can clean up the imports and use xr.open_datatree
. If it's the other way around, then it'll probably be best for one of us to take a quick skim over the tests and tidy them up in a separate PR.
My guess is that this PR will merge before #9033, so we'll likely end up with a separate tiny PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tidying up these imports isn't a big deal, and can be left for another PR
* main: (26 commits) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) ...
* main: Opt out of floor division for float dtype time encoding (pydata#9497) fixed formatting for whats-new (pydata#9493) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483)
re.escape( | ||
"group '/Group1/subgroup1' is not aligned with its parents:\nGroup:\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be just an r
string? Something like
re.escape( | |
"group '/Group1/subgroup1' is not aligned with its parents:\nGroup:\n" | |
) | |
r"group '/Group1/subgroup1' is not aligned with its parents:[\n]Group:[\n].*" |
* open groups zarr initial commit * added tests * Added requested changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TypeHint for zarr groups * update for parent --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Part of #8572
whats-new.rst
api.rst