Skip to content

Change default for concat_characters to False in open_* functions #4452

@eric-czech

Description

@eric-czech

I wanted to propose that concat_characters be False for open_{dataset,zarr,dataarray}. I'm not sure how often that affects anyone since working with individual character arrays is probably rare, but it's a particularly bad default in genetics. We often represent individual variations as single characters and the concatenation is destructive because we can't invert it when one of the characters is an empty string (which often corresponds to a deletion at a base pair location, and the order of the characters matters).

I also find it to be confusing behavior (e.g. #4405) since no other arrays are automatically transformed like this when deserialized.

If submit a PR for this, would anybody object?

Activity

shoyer

shoyer commented on Sep 23, 2020

@shoyer
Member

I agree that there's is no good reason to use concat_characters for zarr, which supports normal fixed-width string datatypes.

For netCDF, we do need concat_character for the "NC_CHAR" dtype, which is used to store strings in lieu of a true fixed width string dtype. It's ugly, but otherwise we won't be able to round-trip string dtype arrays from xarray into netCDF3 files. This note from NetCDF.jl does a nice job of explaining.

dcherian

dcherian commented on Sep 23, 2020

@dcherian
Contributor

we could make the default None in open_data* and set True/False appropriately for netCDF/Zarr backends?

technically we would need to warn for a couple of releases before changing the default in open_zarr but maybe no one cares too much?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @shoyer@dcherian@eric-czech

        Issue actions

          Change default for concat_characters to False in open_* functions · Issue #4452 · pydata/xarray