Skip to content

to_netcdf / open_dataset is not idempotent #4512

@MVivien

Description

@MVivien

What happened:
I created a Dataset from a Dataarray with a data name equal to its dimension name and no coordinate. When saving the Dataset as netcdf and opening that netcdf as a Dataset again the opened Dataset does not have any data variable and the actual variable has become a coordinate.

What you expected to happen:
I would expect the to_netcdf / open_dataset process to be idempotent and obtain a Dataset that is identical to the one I saved as netcdf.

Minimal Complete Verifiable Example:

import xarray as xr

da = xr.DataArray(
    [1, 2, 3, 4],
    dims=['lat'],
    name='lat'
)
ds = da.to_dataset()

ds.to_netcdf('bug.nc')
ds2 = xr.open_dataset('bug.nc')

print(ds)
print(ds2)

Output

<xarray.Dataset>
Dimensions:  (lat: 4)
Dimensions without coordinates: lat
Data variables:
    lat      (lat) int64 1 2 3 4

<xarray.Dataset>
Dimensions:  (lat: 4)
Coordinates:
  * lat      (lat) int64 1 2 3 4
Data variables:
    *empty*

Anything else we need to know?:

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 17:50:39)
[GCC Clang 10.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 19.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.1
pandas: 1.1.3
numpy: 1.19.2
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.4
iris: None
bottleneck: None
dask: 2.30.0
distributed: None
matplotlib: 3.1.3
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 50.3.0.post20201006
pip: 20.2.3
conda: None
pytest: 6.1.0
IPython: 5.8.0
sphinx: None

Activity

dcherian

dcherian commented on Oct 15, 2020

@dcherian
Contributor

This is the same bug as in #4108 (comment)

<xarray.Dataset>
Dimensions:  (lat: 4)
Dimensions without coordinates: lat
Data variables:
    lat      (lat) int64 1 2 3 4

This isn't xarray's data model IIUC. Variables with the same name as dimensions are treated as coordinate variables (or indexed dimensions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @dcherian@keewis@MVivien

        Issue actions

          to_netcdf / open_dataset is not idempotent · Issue #4512 · pydata/xarray