Description
What happened?
I've made NetCDF5 files using h5netcdf
where some datasets are strings, which get encoded as np.bytes_
. This means they show as {SCALAR}
when viewed with h5ls
.
If I load and save again using engine='h5netcdf'
, they become arrays of single bytes. I don't see a way to save the same data type as with h5netcdf.
What did you expect to happen?
I expected to be able to save the HDF5 file with to_netcdf(engine='h5netcdf')
and get the same result as saving with h5netcdf
directly.
Perhaps this is expecting to have an option to load without running np.asarray
on the np.bytes_
object.
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
import h5netcdf
with h5netcdf.File("test-np-bytes.nc", "w") as hf:
hf.create_variable(name="data", data=np.bytes_("test this string"))
with h5netcdf.File("test-np-bytes.nc") as hf:
print("Data as originally loaded by h5netcdf")
data = hf["data"][()]
print(data.dtype)
print(data)
print(repr(data))
print()
with xr.open_dataset("test-np-bytes.nc", engine="h5netcdf") as ds:
print("Data as loaded by xarray")
data = ds.data.values
print(data.dtype)
print(data)
print(repr(data))
print()
with xr.open_dataset("test-np-bytes.nc", engine="h5netcdf") as ds:
print("Running to_netcdf...")
ds.to_netcdf("rewritten.nc", engine="h5netcdf")
with h5netcdf.File("rewritten.nc") as hf:
print()
print("New file, loaded by h5netcdf")
data = hf["data"][()]
print(data.dtype)
print(data)
print(repr(data))
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Data as originally loaded by h5netcdf
|S16
b'test this string'
np.bytes_(b'test this string')
Data as loaded by xarray
|S16
np.bytes_(b'test this string')
array(b'test this string', dtype='|S16')
Running to_netcdf...
New file, loaded by h5netcdf
|S1
[b't' b'e' b's' b't' b' ' b't' b'h' b'i' b's' b' ' b's' b't' b'r' b'i'
b'n' b'g']
array(, dtype='|S1')
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:19:53) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 24.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2025.4.0
pandas: 2.2.3
numpy: 2.0.2
scipy: 1.15.1
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.6.1
h5py: 3.12.1
zarr: 3.0.6
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.4.1
distributed: 2025.4.1
matplotlib: 3.10.0
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2025.2.0
cupy: None
pint: 0.24.4
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.8.0
pip: 24.3.1
conda: 25.1.1
pytest: 8.3.4
mypy: None
IPython: 8.17.2
sphinx: None