-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc updates for 3.0 release #2568
Conversation
@@ -7,7 +7,7 @@ build: | |||
|
|||
sphinx: | |||
configuration: docs/conf.py | |||
fail_on_warning: true | |||
fail_on_warning: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: revert this change before merging!
docs/user-guide/extending.rst
Outdated
in the following ways: | ||
|
||
1. Writing custom stores | ||
2. Writing custom codecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanrz - could you suggest a short bit of text on how the codec system is meant to work. I will do the same for stores.
This is amazing! But 🙏 please could this PR be split up into smaller bits where possible? That way it will be much easier to review |
To add, I'd be happy to pull out independent bits (e.g., removing the license page) myself if that would be helpful? |
This is possible but I think the size here is appropriate given the intent and scope of the work. I'm fine with #2570 but before I split things out any further, I'd like to get feedback on the structure of the new docs. I wrote very few new lines here, it is mostly a reshuffle of existing content. To clarify, at this stage, I'm looking for "30%" feedback. What do you think of the new structure? Do you think this will make the docs easier to navigate, maintain, and/or extend? What other sections should we add (in follow up PRs)? |
Had a look, and I'm 👍 👍 for splitting the tutorial up into sub-pages. I think I'm 👍 on renaming it to "guide" too. So I think at a high level this looks great! Sorry for my negativity further up - I got a bit spooked by some of the content changes and didn't look past them to the tutorial rearrangement. |
…into doc/3.0-updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wonderful @jhamman! I really like the new organization and finally understand vindex vs. oindex 😅
I left a few in-line suggestions and have some general suggestions below, but this would be good to go after a bit of cleanup even if you don't take any of them.
- I'm not sure the comparisons to h5py add a lot of value. I would put them as Notes after rather than in the main part of the explanation, or just remove them to not assume familiarity with h5py.
- I think using more subsections in groups would be helpful, similar to arrays (i.e., Creating groups, Persistent Groups)
- I think Attributes would be better off as a separate section rather than nested under "working with groups" since it pertains to groups and arrays
- Since storage is implicit in the arrays and groups sections, I recommend renaming "storage section" to "working with storage backends" or "working with storage"
- I assume the TODOs in array compression are known issues?
- I thought extensibility was a major reason for Zarr V3? I think it's worth being as comprehensive as possible in the motivations list of the migration guide since the major version bump will place work on users.
- In the v3 migration guide, it would be extra nice if you could include a pointer to those who rely on the deprecated stores on what to do. Perhaps a link to the extension page?
- It's worth being super clear up-front that the migration page isn't comprehensive. E.g., aren't there some small but meaningful changes like listdir() -> list_dir?
- There are some existing references to
RemoteStore
that are beyond where GH supports in-line suggestions
docs/user-guide/arrays.rst
Outdated
zarr.load('data/example.zarr') | ||
|
||
Please note that there are a number of other options for persistent array | ||
storage, see the section on :ref:`tutorial_storage` below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage, see the section on :ref:`tutorial_storage` below. | |
storage, see the :ref:`tutorial_storage` guide for more details. |
docs/user-guide/arrays.rst
Outdated
Indexing with a mask array | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. Items can also be extracted by providing a Boolean mask. E.g.: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. Items can also be extracted by providing a Boolean mask. E.g.: | |
Items can also be extracted by providing a Boolean mask. E.g.: |
This sentence wasn't being rendered in the docs
docs/user-guide/storage.rst
Outdated
The :class:`zarr.storage.FsspecStore` a in-memory store that allows for serialization of | ||
Zarr data (metadata and chunks) to a dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The :class:`zarr.storage.FsspecStore` a in-memory store that allows for serialization of | |
Zarr data (metadata and chunks) to a dictionary. | |
The :class:`zarr.storage.MemoryStore` allows for serialization of Zarr data (metadata and chunks) to an im-memory dictionary. |
docs/user-guide/storage.rst
Outdated
>>> zarr.open(store=store) | ||
<Group file://data/foo/bar> | ||
store = zarr.storage.LocalStore("data/foo/bar", read_only=True) | ||
zarr.open(store=store, mode='r') | ||
|
||
Zip Store | ||
~~~~~~~~~ | ||
|
||
The :class:`zarr.storage.ZipStore` stores the contents of a Zarr hierarchy in a single | ||
Zip file. The `Zip Store specification_` is currently in draft form. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a link in the form of Zip Store specification <insert link>
_
docs/user-guide/storage.rst
Outdated
logical layout as the ``LocalStore``, except the store is assumed to be on a remote storage system | ||
such as cloud object storage (e.g. AWS S3, Google Cloud Storage, Azure Blob Store). The | ||
:class:`zarr.storage.RemoteStore` is backed by `Fsspec_` and can support any Fsspec backend | ||
that implements the `AbstractFileSystem` API, | ||
:class:`zarr.storage.FsspecStore` is backed by `Fsspec_` and can support any Fsspec backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:class:`zarr.storage.FsspecStore` is backed by `Fsspec_` and can support any Fsspec backend | |
:class:`zarr.storage.FsspecStore` is backed by `Fsspec <https://filesystem-spec.readthedocs.io/en/latest/>`_ and can support any Fsspec backend |
|
||
The goals described above necessitated some breaking changes to the API (hence the | ||
major version update), but we have attempted to maintain ~95% backwards compatibility | ||
in the most widely used parts of the API. This in the :class:`zarr.Array` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the most widely used parts of the API. This in the :class:`zarr.Array` and | |
in the most widely used parts of the API. This includes the :class:`zarr.Array` and |
The Array class | ||
~~~~~~~~~~~~~~~ | ||
|
||
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.create_array` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.create_array` | |
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.Group.create_array` |
The Group class | ||
~~~~~~~~~~~~~~~ | ||
|
||
1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.create_group` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.create_group` | |
1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.Group.create_group` |
docs/user-guide/v3_migration.rst
Outdated
|
||
- The following options in the top-level API have not been ported to Zarr-Python 3 yet. | ||
If these options are important to you, please open a | ||
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>` describing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>` describing | |
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>`_ describing |
Migration Guide | ||
--------------- | ||
|
||
The following sections provide details on the most important changes in Zarr-Python 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add that zarr_format=3
is the new default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that is missing, is the new zarr.config
mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some text for these items. Feel free to edit liberally.
docs/user-guide/arrays.rst
Outdated
|
||
.. ipython:: python | ||
|
||
z1 = zarr.open('data/example.zarr', mode='w', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace all uses of zarr.open
, zarr.array
, zarr.group
in favor of zarr.{create,open}_{array,group}
once #2463 lands?
|
||
.. _tutorial_compress: | ||
|
||
Compressors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a section for sharding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
top-level note on the differences between V2 and V3 would be good. (I'm sure this is on your todo list, just a reminder)
…into doc/3.0-updates
…python into doc/3.0-updates
What's the plan with this PR - is it going to be split into several smaller PRs to make it easier to review? |
@dstansby - I don't plan on splitting this up into multiple PRs. I'd like to merge this in more or less its current state then wrap up the TODOs and placeholder sections in smaller PRs to come after (I've already started a new branch integrating #2463 on top of this). Are there parts that you would like to review before we merge? |
Yes, ideally I'd like to review all of it 😄 I know it's a little bit more effort to split it up, but it will make it much easier to get different parts reviewed and catch any mistakes and make improvements. At the moment, reviewing a +/- 2000 line diff is going to take ages. I'm happy to put in the effort to split it up if you don't mind? |
This PR largely rewrites the user guide. For a review, I would just read the new user guide from top to bottom instead of looking at the diff. That should cover most of this PR. |
I disagree - changes in this PR that could be independent PRs are:
By putting these altogether in one big PR it makes them much harder to review because the reviewer has to keep the context of all the changes in their head at once. If instead split up into multiple PRs:
I very strongly believe this should be split up into multiple PRs, instead of being a large collection of multiple concurrent changes to the docs. Again, I'm happy to put in the effort to split this into multiple PRs and do the required rebasing. |
While I usually also favor smaller PRs, I don't think that is the case for this docs PR. Splitting up the PR and reviewing in isolation won't necessarily result in a more cohesive documentation. Again, I think for an overhaul this large, I think it makes more sense to look at the new docs and review whether it is easy-to-understand, comprehensive and correct than to examine the diff. Essentially, you could ignore that old docs existed. I think it should be @jhamman's call whether he wants to split up the PR. |
import numpy as np | ||
|
||
# Create a 2D Zarr array | ||
z = zarr.zeros( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z = zarr.zeros( | |
z = zarr.create_array( |
from numcodecs import Blosc | ||
|
||
z = zarr.open( | ||
"data/example-3.zarr", | ||
mode="w", shape=(100, 100), | ||
chunks=(10, 10), dtype="f4", | ||
compressor=Blosc(cname="zstd", clevel=3, shuffle=Blosc.SHUFFLE), | ||
zarr_format=2 | ||
) | ||
z[:, :] = np.random.random((100, 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from numcodecs import Blosc | |
z = zarr.open( | |
"data/example-3.zarr", | |
mode="w", shape=(100, 100), | |
chunks=(10, 10), dtype="f4", | |
compressor=Blosc(cname="zstd", clevel=3, shuffle=Blosc.SHUFFLE), | |
zarr_format=2 | |
) | |
z[:, :] = np.random.random((100, 100)) | |
z = zarr.create_array( | |
"data/example-3.zarr", | |
mode="w", shape=(100, 100), | |
chunks=(10, 10), dtype="f4", | |
compressors=(zarr.codecs.BloscCodec(),), | |
) | |
z[:, :] = np.random.random((100, 100)) |
# Store the array in a ZIP file | ||
store = zarr.storage.ZipStore("data/example-3.zip", mode='w') | ||
|
||
z = zarr.open( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z = zarr.open( | |
z = zarr.create_array( |
# Open the ZipStore in read-only mode | ||
store = zarr.storage.ZipStore("data/example-3.zip", read_only=True) | ||
|
||
z = zarr.open(store, mode='r') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z = zarr.open(store, mode='r') | |
z = zarr.open_array(store, mode='r') |
|
||
import s3fs | ||
|
||
z = zarr.open("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z = zarr.open("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10)) | |
z = zarr.create_array("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10)) |
root = zarr.open_group('data/group.zarr', mode='w') | ||
root | ||
|
||
z = root.zeros(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z = root.zeros(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4') | |
z = root.create_array(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4') |
|
||
.. ipython:: python | ||
|
||
root = zarr.group() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
root = zarr.group() | |
root = zarr.create_group() |
|
||
import zarr | ||
|
||
z1 = zarr.zeros((10000, 10000), chunks=(100, None), dtype='i4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z1 = zarr.zeros((10000, 10000), chunks=(100, None), dtype='i4') | |
z1 = zarr.create_array((10000, 10000), chunks=(100, None), dtype='i4') |
A bunch of these in this file.
|
||
It is possible to configure how Zarr handles the storage of chunks that are "empty" | ||
(i.e., every element in the chunk is equal to the array's fill value). When creating | ||
an array with ``write_empty_chunks=False``, Zarr will check whether a chunk is empty before compression and storage. If a chunk is empty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an array with ``write_empty_chunks=False``, Zarr will check whether a chunk is empty before compression and storage. If a chunk is empty, | |
an array with the config ``array.write_empty_chunks=False``, Zarr will check whether a chunk is empty before compression and storage. If a chunk is empty, |
|
||
.. warning:: | ||
As noted in the `3.0 Migration Guide <user-guide/v3_migration>`_, there are still a few | ||
features that were present in Zarr-Python 2 that are not yet ported to Zarr-Python 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should state that some features might not be ported at all?
I would agree if this PR was only an entirely new set of docs, but it isn't, it's making a bunch of changes and also deleting a bunch of old docs. So we need to make sure that nothing is being lost, and that the changes are sensible. Perhaps a good halfway house would be to move some of the changes into separate PRs to make it easier to see what's changed, and then have a large (but still smaller than this) PR that contains only deleting the old tutorial, and all the new guides that replace them? Concretely, changes that I think would benefit from being their own PR:
With a few of us available I think we could review/improve/merge thsese all pretty quickly, then leaving a much cleaner and easier to review PR here. |
I'm going to close this PR and open up a string of smaller PRs. More soon. |
This PR includes a fairly large reorganization of the Zarr-Python documentation in preparation for the 3.0 release. Major changes include:
quickstart
pagetutorial.rst
. Seeuser-guide/{parts}.rst
TODOs
todo.rst
. These are mostly features that have not been ported to v3.user-guide/whatsnew_v3.rst
.Closes: #1765, #1767, #1770, #1798, #2550