diff --git a/chunk-key-encodings/suffix/README.md b/chunk-key-encodings/suffix/README.md new file mode 100644 index 0000000..64fd74d --- /dev/null +++ b/chunk-key-encodings/suffix/README.md @@ -0,0 +1,87 @@ +# ZEP: `suffix` Chunk Key Encoding + +## Summary + +This document proposes a new Zarr v3 chunk-key-encoding extension named `suffix`. This encoding appends a user-defined string (the "suffix") to the key generated by a base chunk key encoding. The primary motivation is to allow chunk keys to have file extensions (e.g., `.tiff`, `.zip`), making them directly usable by operating systems and other software that identify file types by their extension. + +--- + +## Motivation + +Modern scientific workflows often involve a variety of tools. While Zarr provides excellent chunked, N-dimensional data access, individual chunks can sometimes be valid, standalone files in other formats. A prime example is a Zarr array sharded into TIFF files. Each shard is both a chunk in the Zarr hierarchy and a complete TIFF file. + +Currently, Zarr chunk keys (like `c/0/0`) lack file extensions. This prevents a user or application from easily identifying and opening these chunks with standard tools (e.g., an image viewer). To work around this, data must be duplicated or accessed exclusively through a Zarr library. + +The `suffix` encoding solves this problem by adding a file extension to the chunk key. This creates a dual-access system: +1. **Zarr Access**: The data remains a fully compliant Zarr array, accessible via the Zarr protocol. +2. **Direct File Access**: The individual chunk files can be directly opened, viewed, or processed by any tool that recognizes their file extension. + +This enhances interoperability and simplifies workflows that bridge Zarr and traditional file-based tools without requiring data duplication. + +--- + +## Specification + +* **Name**: `suffix` +* **Version**: `0.1` +* **Identifier**: (A unique URI to be assigned upon formal adoption) + +### Configuration + +The configuration for this encoding is a JSON object with two required members. + +* `"suffix"`: **(Required)** A string that will be appended to the encoded chunk key. +* `"base_encoding"`: **(Required)** A chunk key encoding configuration object. This specifies the "base" encoding to be used *before* the suffix is appended. + +#### Example 1: Simple Suffix + +This configuration appends `.tiff` to the key generated by the `default` chunk key encoding. + +```json +{ + "name": "suffix", + "configuration": { + "suffix": ".tiff", + "base_encoding": { + "name": "default" + } + } +} +``` + +#### Example 2: Suffix with a Custom Base Encoding + +This configuration first encodes the chunk key using the `v2` naming scheme and then appends `.shard.zip`. + +```json +{ + "name": "suffix", + "configuration": { + "suffix": ".shard.zip", + "base_encoding": { + "name": "v2" + } + } +} +``` + +--- + +## Encoding and Decoding Logic + +The implementation logic is a simple wrapper around an existing chunk key encoding. + +### Encoding + +1. Take the chunk coordinate tuple as input (e.g., `(1, 2)`). +2. Encode the coordinates using the specified **`base_encoding`**. This might transform `(1, 2)` into `"c/1/2"` if the `base_encoding` is set to `default` +3. Append the `suffix` from the configuration to the result of the base encoding. + +The final key is `base_encoded_key + suffix` (e.g., `"c/1/2.tiff"`). + +### Decoding + +1. Take the full chunk key string as input (e.g., `"c/1/2.tiff"`). +2. Verify that the key ends with the configured `suffix`. If not, it is an invalid key for this encoding. +3. Remove the `suffix` from the end of the key string to get the base key (e.g., `"c/1/2"`). +4. Decode the remaining base key using the specified **`base_encoding`** to retrieve the original chunk coordinate tuple `(1, 2)`.