-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint #9243
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I know we touched on this a bit during our meeting on Tuesday but I think tests for
Let me know if I have that right and what you all think. EDIT: EDIT 2: |
Sounds good @eni-awowale You may even be able to use basically the same groups test for multiple backends by adding it to
Having an included file to test against is a good idea, we just want it to be small.
Where are these currently? We don't want to add files to the main repo - preferring to put them here https://github.com/pydata/xarray-data |
So, instead of adding the same type of tests to @TomNicholas
Okay, great there might already be some some sample files in there with groups that I can use! |
I didn't even know those existed... @max-sixty I know you have thought about example datasets for testing purposes - do you have an opinion on whether new files for testing should go in that directory or the separate repository? |
We want tests to be runable without a network connection. If we need new files for testing, which I agree would be a good idea in this case, please add them here. Just keep them as small as possible (the current test files are all under 10 KB in size, most under 1 KB). |
xarray-data is for sample/tutorial datasets, which can be much larger (enough data to make an interesting plot) than what we use for test data. |
…reate netcdf4 file, on the fly
invalid_netcdf=None, | ||
phony_dims=None, | ||
decode_vlen_strings=True, | ||
driver=None, | ||
driver_kwds=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't be here right? They should all fall under `**kwargs``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe they should be in the specific backend but not in common.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so these were added from PR #9199 for adding back the backend specific keyword arguments. I pulled this into my branch after it was merged to main. But they are not in common.py
, they are consolidated as **kwargs
in common.py
.
@@ -466,19 +494,23 @@ def open_datatree( | |||
driver=driver, | |||
driver_kwds=driver_kwds, | |||
) | |||
# Check for a group and make it a parent if it exists | |||
if group: | |||
parent = NodePath("/") / NodePath(group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eni-awowale this is how you should join paths
@@ -837,6 +837,43 @@ def open_datatree( | |||
return backend.open_datatree(filename_or_obj, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a default implementation here that calls open_groups
, i.e.
return backend.open_datatree(filename_or_obj, **kwargs) | |
groups_dict = backend.open_datatree(filename_or_obj, **kwargs) | |
return DataTree.from_dict(groups_dict) |
The idea being that then backend developers don't actually have to implement open_datatree
if they have implemented open_groups
...
This was sort of discussed here (@keewis) #7437 (comment), but this seems like an rabbit hole that should be left for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, I was arguing that given any one of open_dataarray
, open_dataset
and open_datatree
allows us to provide (somewhat inefficient) default implementations for the others. However, open_groups
has a much closer relationship to open_datatree
, so I think having a default implementation for open_datatree
is fine (we just need to make sure that a backend that provides neither open_groups
nor open_datatree
doesn't complain about open_groups
not existing if you called open_datatree
).
So yeah, this might become a rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see. That seems related, but also like a totally optional convenience feature that we should defer to later.
Hi folks! I think this PR is a couple commits away from merging but they are a couple things to sort out.
|
Edit:
Maybe the typing crowd can help? cc @max-sixty, @Illviljan, @headtr1ck |
It's this type of problem: https://stackoverflow.com/questions/73603289/why-doesnt-parameter-type-dictstr-unionstr-int-accept-value-of-type-di |
xarray/core/datatree.py
Outdated
d_cast = cast(dict, d) | ||
root_data = d_cast.pop("/", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mypy is correct here, a Mapping does not include a .pop
and ignoring the typing errors doesn't solve the bug.
xarray uses Mappings frequently, for example xr.core.utils.FrozenDict(dict(a=3)).pop("a", None)
so it's a real issue.
Either you explicitly convert it to dict i.e. d_cast = dict(d)
or refactor the code to not use the .pop
since I'm not so sure it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I updated it to explicitly covert the type to a dict. The mypy3.9 tests are still passing but the CI Mypy check seems to be returning the same error as before explicitly converting it to a dict.
@TomNicholas and @keewis this issue we just moved over #9336 seems to be related to the latest batch of test failures. |
Yay all checks are passing! Does someone want to give this a quick look before merging? |
def open_groups_as_dict( | ||
self, | ||
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, | ||
**kwargs: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should not make the same mistake as with open_dataset
and prevent liskov errors.
**kwargs: Any, |
If the abstract method supports any kwargs, so must all subclass implementations, which is not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@headtr1ck from my understanding I think the **kwargs
were added back to fix this issue #9135
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, Not sure.
But since this is the same problem in all other backend methods, I'm fine with leaving it as it is (and possibly change it in a future PR all together).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good we can revisit this on another PR.
decode_vlen_strings=True, | ||
driver=None, | ||
driver_kwds=None, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be obsolete as well, when you remove it from the abstract method.
persist=False, | ||
lock=None, | ||
autoclose=False, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
* main: Improve error message for missing coordinate index (pydata#9370) Add flaky to TestNetCDF4ViaDaskData (pydata#9373) Make chunk manager an option in `set_options` (pydata#9362) Revise (pydata#9371) Remove duplicate word from docs (pydata#9367) Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243)
* main: (214 commits) Adds copy parameter to __array__ for numpy 2.0 (pydata#9393) `numpy 2` compatibility in the `pydap` backend (pydata#9391) pyarrow dependency added to doc environment (pydata#9394) Extend padding functionalities (pydata#9353) refactor GroupBy internals (pydata#9389) Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274) passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377) Fix tests on big-endian systems (pydata#9380) Improve error message on `ds['x', 'y']` (pydata#9375) Improve error message for missing coordinate index (pydata#9370) Add flaky to TestNetCDF4ViaDaskData (pydata#9373) Make chunk manager an option in `set_options` (pydata#9362) Revise (pydata#9371) Remove duplicate word from docs (pydata#9367) Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243) Revise (pydata#9366) Fix rechunking to a frequency with empty bins. (pydata#9364) whats-new entry for dropping python 3.9 (pydata#9359) drop support for `python=3.9` (pydata#8937) Revise (pydata#9357) ...
whats-new.rst
api.rst
@TomNicholas, @shoyer, @owenlittlejohns, and @flamingbear