-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataTree.to_dict()
method does not behave as expected
#9611
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Thanks for this thoughtful issue!
In the case of the root node you're right that that extra dict entry is redundant. But in general that's not the case: imagine a tree with an empty leaf node - if we dropped that from the dictionary then called However I suppose we could have all empty "intermediate nodes" not appear in the dict, as they get automatically reconstructed... I'm not sure that's a good idea either though - just having one dict entry per node always is a lot simpler, even though it does look weird in your case. |
Basically in order to have the round-tripping property assert tree == DataTree.from_dict(tree.to_dict()) the rule has to be either: otherwise some empty nodes will be lost. |
Yes, I see that removing empty leaf nodes is problematic. I also realized that two different dictionaries can lead to the same dict1 ={"set1": xr.Dataset({"var1": xr.DataArray([1, 2, 3], dims = "time")}),
"set2": xr.Dataset({"var1": xr.DataArray([7, 8, 9], dims = "time")})}
dict2 = {"/": xr.Dataset(),
"/set1": xr.Dataset({"var1": xr.DataArray([1, 2, 3], dims = "time")}),
"/set2": xr.Dataset({"var1": xr.DataArray([7, 8, 9], dims = "time")})}
dt1 = DataTree.from_dict(dict1)
dt2 = DataTree.from_dict(dict2)
xr.testing.assert_identical(dt1, dt2) and therefore, the roundtrip property cannot work universally... The reason why the creation of empty datasets was a problem for me in the first place is because this makes it harder to apply my functionality that worked on dictionaries of |
What happened?
I am working with
DataTree
and find it very useful! However, I think I found a bug in the.to_dict()
method.When I build a
DataTree
from a dict withDataTree.from_dict()
and then want to get the dict again withDataTree.to_dict()
the resulting dict differs from the original one.What did you expect to happen?
I expected the two dicts to be the same. Instead, the root node receives a dict entry with an empty
Dataset
. I argue that empty nodes should not appear in the dict.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
:488: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject
INSTALLED VERSIONS
commit: None
python: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.9.0
pandas: 2.2.0
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.7.1
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: None
dask: 2024.2.0
distributed: 2024.2.0
matplotlib: 3.8.3
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2024.2.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.1.0
pip: 24.0
conda: None
pytest: 8.0.1
mypy: None
IPython: 8.21.0
sphinx: 7.2.6
The text was updated successfully, but these errors were encountered: