Skip to content

Add library and cli flags for file format with embedded dictionary #4036

Open
@pmeenan

Description

@pmeenan

This is still in flight but I wanted to get some feedback from the tooling side before we go too far on the IETF spec for dictionary-compressed responses.

We are considering creating a new file/stream format that adds a 35-byte header before the compressed stream with a magic signature (DCZ) and sha-256 hash of the dictionary that was used to compress the resource.

Currently the dictionary hash is sent in a separate header but there may be value in putting the hash in the file itself and removing the need for an extra header.

Optimally, if we go down this path it would be useful for the Zstandard cli and API's to support generating and decompressing these streams directly rather than wrapping their output in more tooling.

On compression:

  • Add a flag for generating "Dictionary-Compressed Zstandard"
  • Flag limits the compression window to the larger of 8 MB or 1.25 * the size of the dictionary, up to 128 MB
  • Flag requires a dictionary to be specified
  • Output stream is prefixed with DCZ + sha-256 hash of dictionary

On decompression:

  • Add a flag for decompressing "Dictionary-Compressed Zstandard" (or autodetect from the magic signature)
  • Flag requires a dictionary be specified
  • Flag sets compression window max to the larger of 8 MB or 1.25 * the size of the dictionary, up to 128 MB
  • On decompression, if the hash doesn't match the provided dictionary, fail

Does this sound reasonable and make sense to add if we do go down the route of specifying a stream prefix for the dictionary-compressed streams? Are there any concerns/suggestions on the plan itself?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions