@@ -4,7 +4,7 @@ Zarr storage specification version 2
44====================================
55
66This document provides a technical specification of the protocol and format
7- used for storing a Zarr array . The key words "MUST", "MUST NOT", "REQUIRED",
7+ used for storing Zarr arrays . The key words "MUST", "MUST NOT", "REQUIRED",
88"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
99"OPTIONAL" in this document are to be interpreted as described in `RFC 2119
1010<https://www.ietf.org/rfc/rfc2119.txt> `_.
@@ -56,42 +56,47 @@ chunks
5656dtype
5757 A string or list defining a valid data type for the array. See also
5858 the subsection below on data type encoding.
59- compression
60- A string identifying the primary compression library used to compress
61- each chunk of the array.
62- compression_opts
63- An integer, string or dictionary providing options to the primary
64- compression library.
59+ compressor
60+ A JSON object identifying the primary compression codec and providing
61+ configuration parameters, or ``null `` if no compressor is to be used.
62+ The object MUST contain an ``"id" `` key identifying the codec to be used.
6563fill_value
6664 A scalar value providing the default value to use for uninitialized
67- portions of the array.
65+ portions of the array, or `` null `` if no fill_value is to be used .
6866order
6967 Either "C" or "F", defining the layout of bytes within each chunk of the
7068 array. "C" means row-major order, i.e., the last dimension varies fastest;
7169 "F" means column-major order, i.e., the first dimension varies fastest.
70+ filters
71+ A list of JSON objects providing codec configurations, or ``null `` if no
72+ filters are to be applied. Each codec configuration object MUST contain a
73+ ``"id" `` key identifying the codec to be used.
7274
7375Other keys MUST NOT be present within the metadata object.
7476
7577For example, the JSON object below defines a 2-dimensional array of 64-bit
7678little-endian floating point numbers with 10000 rows and 10000 columns, divided
7779into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total
7880arranged in a 10 by 10 grid). Within each chunk the data are laid out in C
79- contiguous order, and each chunk is compressed using the Blosc compression
80- library::
81+ contiguous order. Each chunk is encoded using a delta filter and compressed
82+ using the Blosc compression library prior to storage ::
8183
8284 {
8385 "chunks": [
8486 1000,
8587 1000
8688 ],
87- "compression": "blosc",
88- "compression_opts": {
89- "clevel": 5,
89+ "compressor": {
90+ "id": "blosc",
9091 "cname": "lz4",
92+ "clevel": 5,
9193 "shuffle": 1
9294 },
9395 "dtype": "<f8",
94- "fill_value": null,
96+ "fill_value": "NaN",
97+ "filters": [
98+ {"id": "delta", "dtype": "<f8", "astype": "<f4"}
99+ ],
95100 "order": "C",
96101 "shape": [
97102 10000,
@@ -142,7 +147,6 @@ Positive Infinity ``"Infinity"``
142147Negative Infinity ``"-Infinity" ``
143148================= ===============
144149
145-
146150Chunks
147151~~~~~~
148152
@@ -176,6 +180,16 @@ array dimension is not exactly divisible by the length of the corresponding
176180chunk dimension then some chunks will overhang the edge of the array. The
177181contents of any chunk region falling outside the array are undefined.
178182
183+ Filters
184+ ~~~~~~~
185+
186+ Optionally a sequence of one or more filters can be used to transform chunk
187+ data prior to compression. When storing data, filters are applied in the order
188+ specified in array metadata to encode data, then the encoded data are passed to
189+ the primary compressor. When retrieving data, stored chunk data are
190+ decompressed by the primary compressor then decoded using filters in the
191+ reverse order.
192+
179193Hierarchies
180194-----------
181195
@@ -279,7 +293,7 @@ Create an array::
279293 >>> import zarr
280294 >>> store = zarr.DirectoryStore('example')
281295 >>> a = zarr.create(shape=(20, 20), chunks=(10, 10), dtype='i4',
282- ... fill_value=42, compression='zlib', compression_opts=1 ,
296+ ... fill_value=42, compressor=zarr.Zlib(level=1) ,
283297 ... store=store, overwrite=True)
284298
285299No chunks are initialized yet, so only the ".zarray" and ".zattrs" keys
@@ -297,10 +311,13 @@ Inspect the array metadata::
297311 10,
298312 10
299313 ],
300- "compression": "zlib",
301- "compression_opts": 1,
314+ "compressor": {
315+ "id": "zlib",
316+ "level": 1
317+ },
302318 "dtype": "<i4",
303319 "fill_value": 42,
320+ "filters": null,
304321 "order": "C",
305322 "shape": [
306323 20,
@@ -452,6 +469,10 @@ Changes in version 2
452469* Added support for storing multiple arrays in the same store and organising
453470 arrays into hierarchies using groups.
454471* Array metadata is now stored under the ".zarray" key instead of the "meta"
455- key
472+ key.
456473* Custom attributes are now stored under the ".zattrs" key instead of the
457- "attrs" key
474+ "attrs" key.
475+ * Added support for filters.
476+ * Changed encoding of "fill_value" field within array metadata.
477+ * Changed encoding of compressor information within array metadata to be
478+ consistent with representation of filter information.
0 commit comments