-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What options do I have for <U#
type data saving in zarr?
#10077
Comments
This is the easy question! It's a fixed length UTF string, but I believe Zarr does encode it as UTF-8. For what it's worth, the future proof way to create NumPy arrays of UTF-8 data is to use the UTF-8 string dtype (
You can write Zarr v2 files by passing Otherwise, Zarr v3 needs a way to silence these warnings. And perhaps an advocate to push through the Zarr standardization process :). |
@d-v-b is working on expanding the dtype story in zarr3 now -- including fixed-length-strings. Expect an update within a month or so here. |
If you mean by "not the default" that xr.DataArray(np.array(
["hello", "world"],
dtype=np.dtypes.StringDType,
)).to_zarr("test_utf8_strings.zarr") And it miserably failed:
The above was obtained with Zarr v2. With Zarr v3, I got the same warnings as in the top level comment of this issue.
OK That's comforting, thanks :). |
Yes, this is how things currently work.
I agree, UTF-8 would be a much saner default! It's just a relatively new NumPy feature, and NumPy is very conservative about making breaking changes. |
What is your issue?
So I tried out
xarray
today withzarr
version 3.0.4, and encountered these scary warnings:A MWE is:
Is
<U5
a variable length utf8 type? It shouldn't be... Also, what are my alternatives?The text was updated successfully, but these errors were encountered: