Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to zarr 3 and main kerchunk #406

Merged
merged 24 commits into from
Jan 31, 2025

Conversation

jsignell
Copy link
Contributor

@jsignell jsignell commented Jan 30, 2025

Pulls commits from #378, #405 and #393

UPDATE: Ideas for #376 removed in 42414f0

  • Tests added
  • Tests passing
  • Full type hint coverage
  • Changes are documented in docs/releases.rst
  • New functions/methods are listed in api.rst
  • New functionality has documentation

The remaining failures on upstream can be broken into:

  • codecs-related:
    FAILED virtualizarr/tests/test_codecs.py::TestCodecs::test_manifest_array_zarr_v2_normalized - AssertionError: assert (BytesCodec(e...shuffle': 1})) == (Delta(codec_...shuffle': 1}))
    FAILED virtualizarr/tests/test_writers/test_icechunk.py::test_write_loadable_variable - TypeError: Group.create_array() got an unexpected keyword argument 'exists_ok'
    FAILED virtualizarr/tests/test_writers/test_zarr.py::test_zarr_v3_metadata_conformance - AssertionError: assert (True and 1 > 1)
  • probably setting up the in-memory icechunk store wrong - these are new tests that have never worked - removed in 42414f0
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_roundtrip_no_concat[HDFVirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.zarr3.FixedScaleOffset'> instead.
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_kerchunk_roundtrip_concat[False-time_vars0-HDFVirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.zarr3.FixedScaleOffset'> instead.
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_kerchunk_roundtrip_concat[True-time_vars1-HDF5VirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Group.create_array() got an unexpected keyword argument 'exists_ok'
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_kerchunk_roundtrip_concat[True-time_vars1-HDFVirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Group.create_array() got an unexpected keyword argument 'exists_ok'
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_datetime64_dtype_fill_value[roundtrip_as_in_memory_icechunk] - KeyError: '<M8[ns]'

@TomNicholas
Copy link
Member

probably setting up the in-memory icechunk store wrong - these are new tests that have never worked

I don't know that this PR can be disentangled from the changes we need to get icechunk working with zarr v3...

This was referenced Jan 30, 2025
@jsignell
Copy link
Contributor Author

I don't know that this PR can be disentangled from the changes we need to get icechunk working with zarr v3...

I can take the new in-memory icechunk stuff and leave the place where it could slot in. Or this can just be used as a reference point. I'm not attached 😄

@TomNicholas TomNicholas added zarr-python Relevant to zarr-python upstream Kerchunk Relating to the kerchunk library / specification itself dependencies Updates a dependency Icechunk 🧊 Relates to Icechunk library / spec v3-migration Required for migration to Zarr-Python 3.0 labels Jan 30, 2025
@TomNicholas
Copy link
Member

So IIUC we can either:

  • Merge this now, break main but only for upstream, then fix the remaining failures in follow-up PRs. That's kind of what Planning explicit dependency on Zarr v3 #392 (comment) implied by separating the "pin zarr" and "fix results of pinning zarr" into separate steps.
  • Just try to fix the codec issues in this PR

@maxrjones
Copy link
Member

So IIUC we can either:

  • Merge this now, break main but only for upstream, then fix the remaining failures in follow-up PRs. That's kind of what Planning explicit dependency on Zarr v3 #392 (comment) implied by separating the "pin zarr" and "fix results of pinning zarr" into separate steps.
  • Just try to fix the codec issues in this PR

I'd slightly prefer if we merged this now because it seems to me like this PR really breaks the stalemate that we were in and would unblock other work. specifically for my work, I'd love to merge it into #407 to get the updates already implemented

@TomNicholas
Copy link
Member

I'd slightly prefer if we merged this now

I'm happy to merge this now if @jsignell is. We want to block everyone else as little as possible.

@jsignell
Copy link
Contributor Author

It looks like @abarciauskas-bgse might have fixed all the tests? 👏🏻

@@ -42,7 +42,7 @@ def test_zarr_v3_metadata_conformance(tmpdir, vds_with_manifest_arrays: Dataset)
assert isinstance(metadata["fill_value"], (bool, int, float, str, list))
assert (
isinstance(metadata["codecs"], list)
and len(metadata["codecs"]) > 1
and len(metadata["codecs"]) == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abarciauskas-bgse did the expectation change so it is correct that there is one codec now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is because of how serializer was broken out of _v3_codec_pipeline so these are 2 separate objects now. Today I will investigate if this is how it should be, but my guess is that this will become a part of the refactor to use the zarr-python array class instead of Virtualizarr's zarr.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I took a look at #175 and I agree that this is all going away. So it is probably not worth doing too much work around it.

@@ -198,6 +191,23 @@ def _v3_codec_pipeline(self) -> Any:

return codec_pipeline

def serializer(self) -> Any:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could stand to get reviewed and tested. It looks like it was meant to be just a sketch of what a serializer could look like

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this passes the tests we can perhaps merge and deal with this in the follow-up PR that refactors ZArray away entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly.

@jsignell
Copy link
Contributor Author

I just double checked #410 by merging it with this branch and the tests worked! (Although I did find 76dcfb7 🙊 )

@jsignell
Copy link
Contributor Author

I wasn't sure whether or not to add release notes, but feel free to push to this branch or just merge as you see fit.

@TomNicholas TomNicholas merged commit 83a6f10 into zarr-developers:main Jan 31, 2025
10 of 11 checks passed
@TomNicholas
Copy link
Member

We'll just add release notes in the follow-up that actually changes the kerchunk dependency to a released version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Updates a dependency Icechunk 🧊 Relates to Icechunk library / spec Kerchunk Relating to the kerchunk library / specification itself v3-migration Required for migration to Zarr-Python 3.0 zarr-python Relevant to zarr-python upstream
Projects
Development

Successfully merging this pull request may close these issues.

4 participants