Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document on-disk representation of bitshuffled data #148

Open
graeme-winter opened this issue Dec 10, 2023 · 1 comment
Open

Document on-disk representation of bitshuffled data #148

graeme-winter opened this issue Dec 10, 2023 · 1 comment

Comments

@graeme-winter
Copy link

I got some way reverse-engineering the format so that I can do the bitshuffle independently of lz4 in my application but kept stubbing my toes - some clear documentation on how it is used would be very useful for non-canonical implementations.

For example: it would appear that the on disk representation takes the form of

BE uint32_t compressed_block_size <compressed block> BE uint32_t compressed_block_size <compressed block> BE uint32_t compressed_block_size <compressed block> ...

where <compressed_block> is the result of previously compressing 8192 bytes, then there is a partial block which is smaller, finally a (looks like) verbatim uncompressed teeny bit at the end which is some residual. I could try compressing and then unpacking arbitrary bit patterns to resolve this but it feels like some canonical definition of the on-disk format (beyond, of course, reading the source code) would be a useful addition to this library.

@graeme-winter
Copy link
Author

I found a non-canonical implementation here

https://github.com/dectris/compression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant