Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc updates for 3.0 release #2568

Closed
wants to merge 28 commits into from

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Dec 18, 2024

This PR includes a fairly large reorganization of the Zarr-Python documentation in preparation for the 3.0 release. Major changes include:

  • new top level quickstart page
  • break up tutorial.rst. See user-guide/{parts}.rst
    • use ipython directives to run examples at doc build time
    • new user guide sections for advanced topics (async, extensions, etc.)
  • add new site section for developers (contributors guide, roadmap, etc.)

TODOs

  • Get feedback and decide on what to do with the docs in todo.rst. These are mostly features that have not been ported to v3.
  • I would like to migrate the content from Doc/v3 migration guide #2102 into user-guide/whatsnew_v3.rst.

Closes: #1765, #1767, #1770, #1798, #2550

docs/user-guide/todo.rst Outdated Show resolved Hide resolved
@@ -7,7 +7,7 @@ build:

sphinx:
configuration: docs/conf.py
fail_on_warning: true
fail_on_warning: false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: revert this change before merging!

in the following ways:

1. Writing custom stores
2. Writing custom codecs
Copy link
Member Author

@jhamman jhamman Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanrz - could you suggest a short bit of text on how the codec system is meant to work. I will do the same for stores.

docs/index.rst Outdated Show resolved Hide resolved
@dstansby
Copy link
Contributor

This is amazing! But 🙏 please could this PR be split up into smaller bits where possible? That way it will be much easier to review

@dstansby
Copy link
Contributor

could this PR be split up into smaller bits where possible?

To add, I'd be happy to pull out independent bits (e.g., removing the license page) myself if that would be helpful?

@jhamman
Copy link
Member Author

jhamman commented Dec 18, 2024

could this PR be split up into smaller bits where possible?

This is possible but I think the size here is appropriate given the intent and scope of the work. I'm fine with #2570 but before I split things out any further, I'd like to get feedback on the structure of the new docs. I wrote very few new lines here, it is mostly a reshuffle of existing content.


To clarify, at this stage, I'm looking for "30%" feedback. What do you think of the new structure? Do you think this will make the docs easier to navigate, maintain, and/or extend? What other sections should we add (in follow up PRs)?

@jhamman jhamman changed the title docs: major refactor in preparation for the 3.0 release Doc updates for 3.0 release Dec 18, 2024
@dstansby
Copy link
Contributor

Had a look, and I'm 👍 👍 for splitting the tutorial up into sub-pages. I think I'm 👍 on renaming it to "guide" too. So I think at a high level this looks great!

Sorry for my negativity further up - I got a bit spooked by some of the content changes and didn't look past them to the tutorial rearrangement.

@dstansby dstansby added the documentation Improvements to the documentation label Dec 18, 2024
Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful @jhamman! I really like the new organization and finally understand vindex vs. oindex 😅

I left a few in-line suggestions and have some general suggestions below, but this would be good to go after a bit of cleanup even if you don't take any of them.

  • I'm not sure the comparisons to h5py add a lot of value. I would put them as Notes after rather than in the main part of the explanation, or just remove them to not assume familiarity with h5py.
  • I think using more subsections in groups would be helpful, similar to arrays (i.e., Creating groups, Persistent Groups)
  • I think Attributes would be better off as a separate section rather than nested under "working with groups" since it pertains to groups and arrays
  • Since storage is implicit in the arrays and groups sections, I recommend renaming "storage section" to "working with storage backends" or "working with storage"
  • I assume the TODOs in array compression are known issues?
  • I thought extensibility was a major reason for Zarr V3? I think it's worth being as comprehensive as possible in the motivations list of the migration guide since the major version bump will place work on users.
  • In the v3 migration guide, it would be extra nice if you could include a pointer to those who rely on the deprecated stores on what to do. Perhaps a link to the extension page?
  • It's worth being super clear up-front that the migration page isn't comprehensive. E.g., aren't there some small but meaningful changes like listdir() -> list_dir?
  • There are some existing references to RemoteStore that are beyond where GH supports in-line suggestions

zarr.load('data/example.zarr')

Please note that there are a number of other options for persistent array
storage, see the section on :ref:`tutorial_storage` below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storage, see the section on :ref:`tutorial_storage` below.
storage, see the :ref:`tutorial_storage` guide for more details.

Indexing with a mask array
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. Items can also be extracted by providing a Boolean mask. E.g.:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. Items can also be extracted by providing a Boolean mask. E.g.:
Items can also be extracted by providing a Boolean mask. E.g.:

This sentence wasn't being rendered in the docs

Comment on lines 90 to 91
The :class:`zarr.storage.FsspecStore` a in-memory store that allows for serialization of
Zarr data (metadata and chunks) to a dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The :class:`zarr.storage.FsspecStore` a in-memory store that allows for serialization of
Zarr data (metadata and chunks) to a dictionary.
The :class:`zarr.storage.MemoryStore` allows for serialization of Zarr data (metadata and chunks) to an im-memory dictionary.

>>> zarr.open(store=store)
<Group file://data/foo/bar>
store = zarr.storage.LocalStore("data/foo/bar", read_only=True)
zarr.open(store=store, mode='r')

Zip Store
~~~~~~~~~

The :class:`zarr.storage.ZipStore` stores the contents of a Zarr hierarchy in a single
Zip file. The `Zip Store specification_` is currently in draft form.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a link in the form of Zip Store specification <insert link>_

logical layout as the ``LocalStore``, except the store is assumed to be on a remote storage system
such as cloud object storage (e.g. AWS S3, Google Cloud Storage, Azure Blob Store). The
:class:`zarr.storage.RemoteStore` is backed by `Fsspec_` and can support any Fsspec backend
that implements the `AbstractFileSystem` API,
:class:`zarr.storage.FsspecStore` is backed by `Fsspec_` and can support any Fsspec backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:class:`zarr.storage.FsspecStore` is backed by `Fsspec_` and can support any Fsspec backend
:class:`zarr.storage.FsspecStore` is backed by `Fsspec <https://filesystem-spec.readthedocs.io/en/latest/>`_ and can support any Fsspec backend


The goals described above necessitated some breaking changes to the API (hence the
major version update), but we have attempted to maintain ~95% backwards compatibility
in the most widely used parts of the API. This in the :class:`zarr.Array` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in the most widely used parts of the API. This in the :class:`zarr.Array` and
in the most widely used parts of the API. This includes the :class:`zarr.Array` and

The Array class
~~~~~~~~~~~~~~~

1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.create_array`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.create_array`
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.Group.create_array`

The Group class
~~~~~~~~~~~~~~~

1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.create_group`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.create_group`
1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.Group.create_group`


- The following options in the top-level API have not been ported to Zarr-Python 3 yet.
If these options are important to you, please open a
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>` describing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>` describing
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>`_ describing

Migration Guide
---------------

The following sections provide details on the most important changes in Zarr-Python 3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add that zarr_format=3 is the new default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that is missing, is the new zarr.config mechanism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some text for these items. Feel free to edit liberally.


.. ipython:: python

z1 = zarr.open('data/example.zarr', mode='w', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace all uses of zarr.open, zarr.array, zarr.group in favor of zarr.{create,open}_{array,group} once #2463 lands?


.. _tutorial_compress:

Compressors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a section for sharding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top-level note on the differences between V2 and V3 would be good. (I'm sure this is on your todo list, just a reminder)

@dstansby
Copy link
Contributor

What's the plan with this PR - is it going to be split into several smaller PRs to make it easier to review?

@jhamman
Copy link
Member Author

jhamman commented Dec 28, 2024

@dstansby - I don't plan on splitting this up into multiple PRs. I'd like to merge this in more or less its current state then wrap up the TODOs and placeholder sections in smaller PRs to come after (I've already started a new branch integrating #2463 on top of this). Are there parts that you would like to review before we merge?

@dstansby
Copy link
Contributor

dstansby commented Dec 28, 2024

Yes, ideally I'd like to review all of it 😄 I know it's a little bit more effort to split it up, but it will make it much easier to get different parts reviewed and catch any mistakes and make improvements. At the moment, reviewing a +/- 2000 line diff is going to take ages. I'm happy to put in the effort to split it up if you don't mind?

@normanrz
Copy link
Member

This PR largely rewrites the user guide. For a review, I would just read the new user guide from top to bottom instead of looking at the diff. That should cover most of this PR.

@dstansby
Copy link
Contributor

I disagree - changes in this PR that could be independent PRs are:

  • Moving info to docs/about.rst
  • Using IPython for syntax highlighting
  • Changes/updates to the contributing guide
  • Moving pages to docs/developers
  • Adding a new quickstart page
  • Replacing the tutorial with new user guides
  • Adding a new "extending zarr" page
  • Updating the installation page
  • Adding a new "performance" page
  • Updating the "storage" user guide
  • Adding a new v3 migration guide
  • Adding a new v3 todos guide

By putting these altogether in one big PR it makes them much harder to review because the reviewer has to keep the context of all the changes in their head at once. If instead split up into multiple PRs:

  • the time taken to review each one should be much shorter than the total time needed to (properly) review this PR
  • it will be much easier to catch mistakes and make improvements on smaller PRs
  • it's much more likely multiple core devs will have time to review differnet smaller PRs, which goes with the spirit of Merge Only Changes You Understand ("Code doesn't merely have to work, but should be understood by multiple core developers.")

I very strongly believe this should be split up into multiple PRs, instead of being a large collection of multiple concurrent changes to the docs. Again, I'm happy to put in the effort to split this into multiple PRs and do the required rebasing.

@normanrz
Copy link
Member

While I usually also favor smaller PRs, I don't think that is the case for this docs PR. Splitting up the PR and reviewing in isolation won't necessarily result in a more cohesive documentation. Again, I think for an overhaul this large, I think it makes more sense to look at the new docs and review whether it is easy-to-understand, comprehensive and correct than to examine the diff. Essentially, you could ignore that old docs existed.

I think it should be @jhamman's call whether he wants to split up the PR.

import numpy as np

# Create a 2D Zarr array
z = zarr.zeros(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = zarr.zeros(
z = zarr.create_array(

Comment on lines +68 to +77
from numcodecs import Blosc

z = zarr.open(
"data/example-3.zarr",
mode="w", shape=(100, 100),
chunks=(10, 10), dtype="f4",
compressor=Blosc(cname="zstd", clevel=3, shuffle=Blosc.SHUFFLE),
zarr_format=2
)
z[:, :] = np.random.random((100, 100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from numcodecs import Blosc
z = zarr.open(
"data/example-3.zarr",
mode="w", shape=(100, 100),
chunks=(10, 10), dtype="f4",
compressor=Blosc(cname="zstd", clevel=3, shuffle=Blosc.SHUFFLE),
zarr_format=2
)
z[:, :] = np.random.random((100, 100))
z = zarr.create_array(
"data/example-3.zarr",
mode="w", shape=(100, 100),
chunks=(10, 10), dtype="f4",
compressors=(zarr.codecs.BloscCodec(),),
)
z[:, :] = np.random.random((100, 100))

# Store the array in a ZIP file
store = zarr.storage.ZipStore("data/example-3.zip", mode='w')

z = zarr.open(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = zarr.open(
z = zarr.create_array(

# Open the ZipStore in read-only mode
store = zarr.storage.ZipStore("data/example-3.zip", read_only=True)

z = zarr.open(store, mode='r')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = zarr.open(store, mode='r')
z = zarr.open_array(store, mode='r')


import s3fs

z = zarr.open("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = zarr.open("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10))
z = zarr.create_array("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10))

root = zarr.open_group('data/group.zarr', mode='w')
root

z = root.zeros(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = root.zeros(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')
z = root.create_array(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')


.. ipython:: python

root = zarr.group()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
root = zarr.group()
root = zarr.create_group()


import zarr

z1 = zarr.zeros((10000, 10000), chunks=(100, None), dtype='i4')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z1 = zarr.zeros((10000, 10000), chunks=(100, None), dtype='i4')
z1 = zarr.create_array((10000, 10000), chunks=(100, None), dtype='i4')

A bunch of these in this file.


It is possible to configure how Zarr handles the storage of chunks that are "empty"
(i.e., every element in the chunk is equal to the array's fill value). When creating
an array with ``write_empty_chunks=False``, Zarr will check whether a chunk is empty before compression and storage. If a chunk is empty,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
an array with ``write_empty_chunks=False``, Zarr will check whether a chunk is empty before compression and storage. If a chunk is empty,
an array with the config ``array.write_empty_chunks=False``, Zarr will check whether a chunk is empty before compression and storage. If a chunk is empty,


.. warning::
As noted in the `3.0 Migration Guide <user-guide/v3_migration>`_, there are still a few
features that were present in Zarr-Python 2 that are not yet ported to Zarr-Python 3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should state that some features might not be ported at all?

@dstansby
Copy link
Contributor

I think for an overhaul this large, I think it makes more sense to look at the new docs and review whether it is easy-to-understand, comprehensive and correct than to examine the diff. Essentially, you could ignore that old docs existed.

I would agree if this PR was only an entirely new set of docs, but it isn't, it's making a bunch of changes and also deleting a bunch of old docs. So we need to make sure that nothing is being lost, and that the changes are sensible.

Perhaps a good halfway house would be to move some of the changes into separate PRs to make it easier to see what's changed, and then have a large (but still smaller than this) PR that contains only deleting the old tutorial, and all the new guides that replace them?

Concretely, changes that I think would benefit from being their own PR:

  • Moving info to docs/about.rst (already done in Clean up getting started page #2566)
  • Moving pages to docs/developers
  • Using IPython for syntax highlighting
  • Changes/updates to the contributing guide
  • Updating the installation page

With a few of us available I think we could review/improve/merge thsese all pretty quickly, then leaving a much cleaner and easier to review PR here.

@jhamman
Copy link
Member Author

jhamman commented Dec 29, 2024

I'm going to close this PR and open up a string of smaller PRs. More soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements to the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] Documentation: Update developer docs
6 participants