Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging corrupted bitshuffle data #127

Open
telegraphic opened this issue Oct 12, 2022 · 2 comments
Open

Debugging corrupted bitshuffle data #127

telegraphic opened this issue Oct 12, 2022 · 2 comments

Comments

@telegraphic
Copy link

Hi @kiyo-masui, we have some SETI data stored with bitshuffle compression, and a small number of files appear to have become corrupted. (Here is one, FYI: https://bldata.berkeley.edu/blpd30_datax2/blc03_guppi_59132_36704_HIP111595_0078.rawspec.0002.h5)

h5py is happy to open the file, but barfs if you try and access the bitshuffled dataset:

In [3]: a = h5py.File('blc03_guppi_59132_36704_HIP111595_0078.rawspec.0002.h5', 'r')
In [4]: a['data']
Out[4]: <HDF5 dataset "data": shape (279, 1, 65536), type "<f4">

In [5]: d = a['data'][:]
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-5-fee15ce54759> in <module>
----> 1 d = a['data'][:]

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/opt/anaconda3/lib/python3.8/site-packages/h5py/_hl/dataset.py in __getitem__(self, args)
    571         mspace = h5s.create_simple(mshape)
    572         fspace = selection.id
--> 573         self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
    574
    575         # Patch up the output for NumPy

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.read()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dread()

OSError: Can't read data (filter returned failure during read)

Do you think this file is recoverable (or partly recoverable)? Is there any way to turn on extra debug info in bitshuffle to help diagnose why it fails, and/or can bitshuffle skip over 'bad' chunks?

@kiyo-masui
Copy link
Owner

With a bit of hacking, I think you should be able to recover most of the data. First, I would just add print statements in bshuf_h5filter.c to figure out which exactly what function is returning an error code and the value of that code (the core functions of bitshuffle some some specific error codes with meanings).

@telegraphic
Copy link
Author

telegraphic commented Oct 17, 2022

Thanks @kiyo-masui, I'll take a look following that strategy.

As it's an issue with decompression, looks like here is a good place to start:

err = bshuf_decompress_lz4(in_buf, out_buf, size, elem_size, block_size);

Which calls:

int64_t bshuf_decompress_lz4(const void* in, void* out, const size_t size,

And then each block is done with:

int64_t bshuf_decompress_lz4_block(ioc_chain *C_ptr,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants