Improve image reader selection function #157

DavidStirling · 2022-02-23T12:32:29Z

Fixes #129, Fixes CellProfiler/CellProfiler#3411

This PR addresses a long-standing issue with python-bioformats reading file metadata incorrectly, particularly when inspecting OME-TIF files. Within CellProfiler this manifested as the Metadata module "seeing" all frames within a multidimensional image as Timepoints instead of C, Z and T series. Example files can be found here for testing.

The core issue was that python-bioformats used a custom strategy in get_image_reader to attempt to find the correct reader for a supplied image file. This involved testing filenames against the list of available reader classes over a series of passes aimed at finding the best match. The key objective there was to avoid needing to have bioformats open the files and inspect the header to determine whether said reader was the correct choice, instead basing things on the file extension if possible.

However, the reader selection implementation in bioformats has evolved substantially over the years. Today the OME-TIF reader (for example) will never be selected at all if performing selection in extension-only mode. Extension-only matching is actually now also available as an option within the reader, so the javascript implementation from python-bioformats is somewhat redundant. Furthermore, allowing bioformats to open files for inspection is no longer associated with the same performance cost that it once was. In my testing allowing file inspection resulted in CellProfiler getting the correct reader and metadata without any significant slowdown.

With this in mind, I've revised the reader selection function to use the native bioformats selector, with the option to work in extension-only mode parameterised as the new allow_open_image argument in get_image_reader. I've had this default to True to ensure that the correct reader is selected by default.

In a seperate PR we should add a CellProfiler setting to revert back to the old functionality, which would basically pass allow_open_image=False into reader requests. This would deliver the same results as the current release so that anyone who wrote their pipeline to handle the incorrect metadata can still use those workflows.

DavidStirling added 2 commits February 23, 2022 12:00

Add allow_open_files parameter

5b950ed

Implement per-plane indexing with czt

4c606bf

DavidStirling mentioned this pull request Nov 13, 2024

OME TIFF dimensions rearranged on image plane extraction CellProfiler/CellProfiler#4821

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve image reader selection function #157

Improve image reader selection function #157

DavidStirling commented Feb 23, 2022

Improve image reader selection function #157

Are you sure you want to change the base?

Improve image reader selection function #157

Conversation

DavidStirling commented Feb 23, 2022