Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The example in delta-encoding seems incorrect #393

Open
asfimport opened this issue Apr 17, 2021 · 0 comments
Open

The example in delta-encoding seems incorrect #393

asfimport opened this issue Apr 17, 2021 · 0 comments

Comments

@asfimport
Copy link
Collaborator

In the example using delta-encoded, encoding [1, 2, 3, 4, 5], we state that

The final encoded data is:

header: 8 (block size), 1 (miniblock count), 5 (value count), 1 (first value)

block 1 (minimum delta), 0 (bitwidth), (no data needed for bitwidth 0)

I believe that the correct result should be

header: [8, 1, 5, 2]
block: [2, 0]

I.e first_value and min_delta should be 2, not 1.

This is because the zig-zag ULEB128-encoding of 1 is 2: the ULEB-128 encoding of 1 is 1, but AFAIK the zig-zag encoding of 1 is 2 (see e.g. here).

Alternatively, we could re-phrase "The final encoded data is:" to "The final data prior to zig-zag encoding is:"

Reporter: Jorge Leitão / @jorgecarleitao

Note: This issue was originally created as PARQUET-2028. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant