Skip to content

np.bytes_ scalar datasets in NetCDF4 are converted to arrays of bytes #10389

Open
@scottstanie

Description

@scottstanie

What happened?

I've made NetCDF5 files using h5netcdf where some datasets are strings, which get encoded as np.bytes_. This means they show as {SCALAR} when viewed with h5ls.

If I load and save again using engine='h5netcdf', they become arrays of single bytes. I don't see a way to save the same data type as with h5netcdf.

What did you expect to happen?

I expected to be able to save the HDF5 file with to_netcdf(engine='h5netcdf') and get the same result as saving with h5netcdf directly.
Perhaps this is expecting to have an option to load without running np.asarray on the np.bytes_ object.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import h5netcdf

with h5netcdf.File("test-np-bytes.nc", "w") as hf:
    hf.create_variable(name="data", data=np.bytes_("test this string"))

with h5netcdf.File("test-np-bytes.nc") as hf:
    print("Data as originally loaded by h5netcdf")
    data = hf["data"][()]
    print(data.dtype)
    print(data)
    print(repr(data))
    print()

with xr.open_dataset("test-np-bytes.nc", engine="h5netcdf") as ds:
    print("Data as loaded by xarray")
    data = ds.data.values
    print(data.dtype)
    print(data)
    print(repr(data))
    print()

with xr.open_dataset("test-np-bytes.nc", engine="h5netcdf") as ds:
    print("Running to_netcdf...")
    ds.to_netcdf("rewritten.nc", engine="h5netcdf")

with h5netcdf.File("rewritten.nc") as hf:
    print()
    print("New file, loaded by h5netcdf")
    data = hf["data"][()]
    print(data.dtype)
    print(data)
    print(repr(data))

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Data as originally loaded by h5netcdf
|S16
b'test this string'
np.bytes_(b'test this string')

Data as loaded by xarray
|S16
np.bytes_(b'test this string')
array(b'test this string', dtype='|S16')

Running to_netcdf...

New file, loaded by h5netcdf
|S1
[b't' b'e' b's' b't' b' ' b't' b'h' b'i' b's' b' ' b's' b't' b'r' b'i'
 b'n' b'g']
array(, dtype='|S1')

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:19:53) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 24.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2025.4.0
pandas: 2.2.3
numpy: 2.0.2
scipy: 1.15.1
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.6.1
h5py: 3.12.1
zarr: 3.0.6
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.4.1
distributed: 2025.4.1
matplotlib: 3.10.0
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2025.2.0
cupy: None
pint: 0.24.4
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.8.0
pip: 24.3.1
conda: 25.1.1
pytest: 8.3.4
mypy: None
IPython: 8.17.2
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions