Improve image reader selection function #157
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #129, Fixes CellProfiler/CellProfiler#3411
This PR addresses a long-standing issue with python-bioformats reading file metadata incorrectly, particularly when inspecting OME-TIF files. Within CellProfiler this manifested as the Metadata module "seeing" all frames within a multidimensional image as Timepoints instead of C, Z and T series. Example files can be found here for testing.
The core issue was that python-bioformats used a custom strategy in
get_image_reader
to attempt to find the correct reader for a supplied image file. This involved testing filenames against the list of available reader classes over a series of passes aimed at finding the best match. The key objective there was to avoid needing to have bioformats open the files and inspect the header to determine whether said reader was the correct choice, instead basing things on the file extension if possible.However, the reader selection implementation in bioformats has evolved substantially over the years. Today the OME-TIF reader (for example) will never be selected at all if performing selection in extension-only mode. Extension-only matching is actually now also available as an option within the reader, so the javascript implementation from python-bioformats is somewhat redundant. Furthermore, allowing bioformats to open files for inspection is no longer associated with the same performance cost that it once was. In my testing allowing file inspection resulted in CellProfiler getting the correct reader and metadata without any significant slowdown.
With this in mind, I've revised the reader selection function to use the native bioformats selector, with the option to work in extension-only mode parameterised as the new
allow_open_image
argument inget_image_reader
. I've had this default toTrue
to ensure that the correct reader is selected by default.In a seperate PR we should add a CellProfiler setting to revert back to the old functionality, which would basically pass
allow_open_image=False
into reader requests. This would deliver the same results as the current release so that anyone who wrote their pipeline to handle the incorrect metadata can still use those workflows.