Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions chunk-key-encodings/suffix/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# ZEP: `suffix` Chunk Key Encoding

## Summary

This document proposes a new Zarr v3 chunk-key-encoding extension named `suffix`. This encoding appends a user-defined string (the "suffix") to the key generated by a base chunk key encoding. The primary motivation is to allow chunk keys to have file extensions (e.g., `.tiff`, `.zip`), making them directly usable by operating systems and other software that identify file types by their extension.

---

## Motivation

Modern scientific workflows often involve a variety of tools. While Zarr provides excellent chunked, N-dimensional data access, individual chunks can sometimes be valid, standalone files in other formats. A prime example is a Zarr array sharded into TIFF files. Each shard is both a chunk in the Zarr hierarchy and a complete TIFF file.

Currently, Zarr chunk keys (like `c/0/0`) lack file extensions. This prevents a user or application from easily identifying and opening these chunks with standard tools (e.g., an image viewer). To work around this, data must be duplicated or accessed exclusively through a Zarr library.

The `suffix` encoding solves this problem by adding a file extension to the chunk key. This creates a dual-access system:
1. **Zarr Access**: The data remains a fully compliant Zarr array, accessible via the Zarr protocol.
2. **Direct File Access**: The individual chunk files can be directly opened, viewed, or processed by any tool that recognizes their file extension.

This enhances interoperability and simplifies workflows that bridge Zarr and traditional file-based tools without requiring data duplication.

---

## Specification

* **Name**: `suffix`
* **Version**: `0.1`
* **Identifier**: (A unique URI to be assigned upon formal adoption)

### Configuration

The configuration for this encoding is a JSON object with two required members.

* `"suffix"`: **(Required)** A string that will be appended to the encoded chunk key.
* `"base_encoding"`: **(Required)** A chunk key encoding configuration object. This specifies the "base" encoding to be used *before* the suffix is appended.

#### Example 1: Simple Suffix

This configuration appends `.tiff` to the key generated by the `default` chunk key encoding.

```json
{
"name": "suffix",
"configuration": {
"suffix": ".tiff",
"base_encoding": {
"name": "default"
}
}
}
```

#### Example 2: Suffix with a Custom Base Encoding

This configuration first encodes the chunk key using the `v2` naming scheme and then appends `.shard.zip`.

```json
{
"name": "suffix",
"configuration": {
"suffix": ".shard.zip",
"base_encoding": {
"name": "v2"
}
}
}
```

---

## Encoding and Decoding Logic

The implementation logic is a simple wrapper around an existing chunk key encoding.

### Encoding

1. Take the chunk coordinate tuple as input (e.g., `(1, 2)`).
2. Encode the coordinates using the specified **`base_encoding`**. This might transform `(1, 2)` into `"c/1/2"` if the `base_encoding` is set to `default`
3. Append the `suffix` from the configuration to the result of the base encoding.

The final key is `base_encoded_key + suffix` (e.g., `"c/1/2.tiff"`).

### Decoding

1. Take the full chunk key string as input (e.g., `"c/1/2.tiff"`).
2. Verify that the key ends with the configured `suffix`. If not, it is an invalid key for this encoding.
3. Remove the `suffix` from the end of the key string to get the base key (e.g., `"c/1/2"`).
4. Decode the remaining base key using the specified **`base_encoding`** to retrieve the original chunk coordinate tuple `(1, 2)`.