You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to propose that concat_characters be False for open_{dataset,zarr,dataarray}. I'm not sure how often that affects anyone since working with individual character arrays is probably rare, but it's a particularly bad default in genetics. We often represent individual variations as single characters and the concatenation is destructive because we can't invert it when one of the characters is an empty string (which often corresponds to a deletion at a base pair location, and the order of the characters matters).
I also find it to be confusing behavior (e.g. #4405) since no other arrays are automatically transformed like this when deserialized.
I agree that there's is no good reason to use concat_characters for zarr, which supports normal fixed-width string datatypes.
For netCDF, we do need concat_character for the "NC_CHAR" dtype, which is used to store strings in lieu of a true fixed width string dtype. It's ugly, but otherwise we won't be able to round-trip string dtype arrays from xarray into netCDF3 files. This note from NetCDF.jl does a nice job of explaining.
Activity
shoyer commentedon Sep 23, 2020
I agree that there's is no good reason to use concat_characters for zarr, which supports normal fixed-width string datatypes.
For netCDF, we do need concat_character for the "NC_CHAR" dtype, which is used to store strings in lieu of a true fixed width string dtype. It's ugly, but otherwise we won't be able to round-trip string dtype arrays from xarray into netCDF3 files. This note from NetCDF.jl does a nice job of explaining.
dcherian commentedon Sep 23, 2020
we could make the default
None
inopen_data*
and set True/False appropriately for netCDF/Zarr backends?technically we would need to warn for a couple of releases before changing the default in
open_zarr
but maybe no one cares too much?