-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint #9243
Changes from all commits
33ee4a9
8186d86
6b63704
ef01edc
08e230a
e01b0fb
9e1984c
33a71ac
b10fa10
565ffb1
ce607e6
eaba908
b4b9822
9b9c1e7
225489d
5fe8e96
3b9c418
222279c
4c65b0a
8c81a87
d2c74d6
f206408
5d34920
f72f3d2
2f92b5c
175e287
6319678
1466147
0e3c946
1ffeae5
abd4981
d44bf98
b2cf9b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -132,6 +132,7 @@ def _iter_nc_groups(root, parent="/"): | |||
from xarray.core.treenode import NodePath | ||||
|
||||
parent = NodePath(parent) | ||||
yield str(parent) | ||||
for path, group in root.groups.items(): | ||||
gpath = parent / path | ||||
yield str(gpath) | ||||
|
@@ -535,6 +536,22 @@ def open_datatree( | |||
|
||||
raise NotImplementedError() | ||||
|
||||
def open_groups_as_dict( | ||||
self, | ||||
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, | ||||
**kwargs: Any, | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should not make the same mistake as with
Suggested change
If the abstract method supports any kwargs, so must all subclass implementations, which is not what we want. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @headtr1ck from my understanding I think the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm, Not sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good we can revisit this on another PR. |
||||
) -> dict[str, Dataset]: | ||||
""" | ||||
Opens a dictionary mapping from group names to Datasets. | ||||
|
||||
Called by :py:func:`~xarray.open_groups`. | ||||
This function exists to provide a universal way to open all groups in a file, | ||||
before applying any additional consistency checks or requirements necessary | ||||
to create a `DataTree` object (typically done using :py:meth:`~xarray.DataTree.from_dict`). | ||||
""" | ||||
|
||||
raise NotImplementedError() | ||||
|
||||
|
||||
# mapping of engine name to (module name, BackendEntrypoint Class) | ||||
BACKEND_ENTRYPOINTS: dict[str, tuple[str | None, type[BackendEntrypoint]]] = {} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -448,9 +448,36 @@ def open_datatree( | |
driver_kwds=None, | ||
**kwargs, | ||
) -> DataTree: | ||
from xarray.backends.api import open_dataset | ||
from xarray.backends.common import _iter_nc_groups | ||
|
||
from xarray.core.datatree import DataTree | ||
|
||
groups_dict = self.open_groups_as_dict(filename_or_obj, **kwargs) | ||
|
||
return DataTree.from_dict(groups_dict) | ||
|
||
def open_groups_as_dict( | ||
self, | ||
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, | ||
*, | ||
mask_and_scale=True, | ||
decode_times=True, | ||
concat_characters=True, | ||
decode_coords=True, | ||
drop_variables: str | Iterable[str] | None = None, | ||
use_cftime=None, | ||
decode_timedelta=None, | ||
format=None, | ||
group: str | Iterable[str] | Callable | None = None, | ||
lock=None, | ||
eni-awowale marked this conversation as resolved.
Show resolved
Hide resolved
|
||
invalid_netcdf=None, | ||
phony_dims=None, | ||
decode_vlen_strings=True, | ||
driver=None, | ||
driver_kwds=None, | ||
Comment on lines
+472
to
+476
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These shouldn't be here right? They should all fall under `**kwargs`` There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe they should be in the specific backend but not in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, so these were added from PR #9199 for adding back the backend specific keyword arguments. I pulled this into my branch after it was merged to main. But they are not in |
||
**kwargs, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be obsolete as well, when you remove it from the abstract method. |
||
) -> dict[str, Dataset]: | ||
|
||
from xarray.backends.common import _iter_nc_groups | ||
from xarray.core.treenode import NodePath | ||
from xarray.core.utils import close_on_error | ||
|
||
|
@@ -466,19 +493,19 @@ def open_datatree( | |
driver=driver, | ||
driver_kwds=driver_kwds, | ||
) | ||
# Check for a group and make it a parent if it exists | ||
if group: | ||
parent = NodePath("/") / NodePath(group) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @eni-awowale this is how you should join paths |
||
else: | ||
parent = NodePath("/") | ||
|
||
manager = store._manager | ||
ds = open_dataset(store, **kwargs) | ||
tree_root = DataTree.from_dict({str(parent): ds}) | ||
groups_dict = {} | ||
for path_group in _iter_nc_groups(store.ds, parent=parent): | ||
group_store = H5NetCDFStore(manager, group=path_group, **kwargs) | ||
store_entrypoint = StoreBackendEntrypoint() | ||
with close_on_error(group_store): | ||
ds = store_entrypoint.open_dataset( | ||
group_ds = store_entrypoint.open_dataset( | ||
group_store, | ||
mask_and_scale=mask_and_scale, | ||
decode_times=decode_times, | ||
|
@@ -488,14 +515,11 @@ def open_datatree( | |
use_cftime=use_cftime, | ||
decode_timedelta=decode_timedelta, | ||
) | ||
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds) | ||
tree_root._set_item( | ||
path_group, | ||
new_node, | ||
allow_overwrite=False, | ||
new_nodes_along_path=True, | ||
) | ||
return tree_root | ||
|
||
group_name = str(NodePath(path_group)) | ||
groups_dict[group_name] = group_ds | ||
|
||
return groups_dict | ||
|
||
|
||
BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -688,9 +688,34 @@ def open_datatree( | |
autoclose=False, | ||
**kwargs, | ||
) -> DataTree: | ||
from xarray.backends.api import open_dataset | ||
from xarray.backends.common import _iter_nc_groups | ||
|
||
from xarray.core.datatree import DataTree | ||
|
||
groups_dict = self.open_groups_as_dict(filename_or_obj, **kwargs) | ||
|
||
return DataTree.from_dict(groups_dict) | ||
|
||
def open_groups_as_dict( | ||
self, | ||
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, | ||
*, | ||
mask_and_scale=True, | ||
decode_times=True, | ||
concat_characters=True, | ||
decode_coords=True, | ||
drop_variables: str | Iterable[str] | None = None, | ||
use_cftime=None, | ||
decode_timedelta=None, | ||
group: str | Iterable[str] | Callable | None = None, | ||
format="NETCDF4", | ||
clobber=True, | ||
diskless=False, | ||
persist=False, | ||
lock=None, | ||
autoclose=False, | ||
**kwargs, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. |
||
) -> dict[str, Dataset]: | ||
from xarray.backends.common import _iter_nc_groups | ||
from xarray.core.treenode import NodePath | ||
|
||
filename_or_obj = _normalize_path(filename_or_obj) | ||
|
@@ -704,19 +729,20 @@ def open_datatree( | |
lock=lock, | ||
autoclose=autoclose, | ||
) | ||
|
||
# Check for a group and make it a parent if it exists | ||
if group: | ||
parent = NodePath("/") / NodePath(group) | ||
else: | ||
parent = NodePath("/") | ||
|
||
manager = store._manager | ||
ds = open_dataset(store, **kwargs) | ||
tree_root = DataTree.from_dict({str(parent): ds}) | ||
groups_dict = {} | ||
for path_group in _iter_nc_groups(store.ds, parent=parent): | ||
group_store = NetCDF4DataStore(manager, group=path_group, **kwargs) | ||
store_entrypoint = StoreBackendEntrypoint() | ||
with close_on_error(group_store): | ||
ds = store_entrypoint.open_dataset( | ||
group_ds = store_entrypoint.open_dataset( | ||
group_store, | ||
mask_and_scale=mask_and_scale, | ||
decode_times=decode_times, | ||
|
@@ -726,14 +752,10 @@ def open_datatree( | |
use_cftime=use_cftime, | ||
decode_timedelta=decode_timedelta, | ||
) | ||
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds) | ||
tree_root._set_item( | ||
path_group, | ||
new_node, | ||
allow_overwrite=False, | ||
new_nodes_along_path=True, | ||
) | ||
return tree_root | ||
group_name = str(NodePath(path_group)) | ||
groups_dict[group_name] = group_ds | ||
|
||
return groups_dict | ||
|
||
|
||
BACKEND_ENTRYPOINTS["netcdf4"] = ("netCDF4", NetCDF4BackendEntrypoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a default implementation here that calls
open_groups
, i.e.The idea being that then backend developers don't actually have to implement
open_datatree
if they have implementedopen_groups
...This was sort of discussed here (@keewis) #7437 (comment), but this seems like an rabbit hole that should be left for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, I was arguing that given any one of
open_dataarray
,open_dataset
andopen_datatree
allows us to provide (somewhat inefficient) default implementations for the others. However,open_groups
has a much closer relationship toopen_datatree
, so I think having a default implementation foropen_datatree
is fine (we just need to make sure that a backend that provides neitheropen_groups
noropen_datatree
doesn't complain aboutopen_groups
not existing if you calledopen_datatree
).So yeah, this might become a rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see. That seems related, but also like a totally optional convenience feature that we should defer to later.