-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hdf5 read with broken links #904
Fix hdf5 read with broken links #904
Conversation
Resolves issue with loading Dectris-compressed data (lzf compression if I'm not mistaken).
Added import of hdf5plugin
ignore flake8 here.
Fighting flake8
Fixed the years of previous changelog entries, and added mine.
Fixing a trailing whitespace for flake8's pleasure.
merging changes from web (hdf5plugin) with local changes (HDF5 passing).
message on the graceful handling of broken external links in HDF5 files.
…BAMresearch/tiled into fix_hdf5_read_with_broken_links syncing web and local commits.
Accidentally has the fixes from #903 as well, I think. |
I worry that this error will pass too silently, and may show up as a confusing bug far downstream, such as in some plotting code trying to operate on a busted data set. Let me take another look at how to make |
I understand your concerns, would be good to list issues in an easily accessible location. But the problems run deeper than just with the tree functionality (I'll explain in a moment), so the suggested fix solves a real practical issue with data access but at the cost of a potential increased risk of difficult troubleshooting later. Without a fix in place, I can access some keys and data in the files, but not get keys at all when I approach the broken link: sorted(client.get("")["MOUSE_20250124_2_104_processed_250128_092417/"].keys())
# ['entry1', 'processed', 'saxs']
sorted(client.get("")["MOUSE_20250124_2_104_processed_250128_092417/entry1"].keys())
# error
# accessing data if you happen to know where it is, seems to work:
client.get("MOUSE_20250124_2_104_processed_250128_092417/entry1/instrument/detector00/data")[:,:,:].compute()
# array... With the fix in place, I can access the element keys of the files with broken links right up until the broken link, but otherwise explore the rest of the content perfectly normally: sorted(client.get("")["MOUSE_20250124_2_104_processed_250128_092417/entry1/processing/direct_beam_profile"].keys())
# ['beam_analysis', 'data', 'instrument'] (the offending item is in 'data' and 'instrument', which links to the beam measurement file). So the question I guess boils down to what to do with non-critical warnings about data content.. In an ideal world, we would never have broken external links, but in reality, this happens easily when data gets reorganised or copied to other locations. When this does not affect the main data content but "just" some supplementary data put there for completeness, My opinion is that this should not invalidate access to the entire file. Is there a good way to warn users of broken data in a more obvious way? |
(and yes, I'll fix my file structuring to add the data to the internal tree rather than rely on external links.. disk space is cheap). |
One more comment, a similar resolution is essentially being applied here:
|
The mysterious issue where things break when you approach the broken link is due to a performance optimization where the server will proactively traverse the file and send a batch of nested content in a single request. Before we added this, traversing highly nested HDF5 file (e.g. common NeXus files) was extremely slow due to the stacked-up latency of many requests with small responses. As an aside, that behavior is enabled/controlled by this method on the Adapter. Lines 251 to 252 in ef4a7fa
You make a fair point that there is precedent for a similar approach. Lines 149 to 167 in ef4a7fa
There is an HTTP status code That would require some wide-range changes (adapter, server, and client working together) and might best be approached as part of a cohesive effort to support this kind of thing in Tiled. So, I'd be OK with merging this PR as a stop-gap at least until we get around to anything better. What do you think? |
That's a perfectly workable solution for me, I'll add a TODO: note to the segment to indicate that that needs to be replaced with a better solution in due time. I like the 410 Gone status code, it allows the end user to deal with the error as they see fit. In the mean time, I'll work on fixing my data ;) |
Fix suggestion for #888
Fixed by adding a try.. except block to catch the key error, changing it to a warning, and returning the warning as a string encapsulated in a numpy array to expose to the user.
Checklist