Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the OMERO model and API to represent and handle archived data #6390

Open
sbesson opened this issue Jun 6, 2024 · 0 comments
Open

Comments

@sbesson
Copy link
Member

sbesson commented Jun 6, 2024

Motivation

OMERO is being deployed at institutional & national levels with many repositories in the order of 10-100s of TBs of data spanning over more than a decade. The growth of managed data challenges the standard expectation from OMERO that all data must be available on a fast local or network file storage. Various strategies have been explored by the institutions for managing the data lifecycle. A standard approach is to have the data move through layers of tiered storage solutions. Eventually, data that is rarely accessed is stored in an archive tier which is effectively offline.

The current data access patterns and APIs in OMERO.server heavily on the underlying data being available with very low latency. Having data archived and unavailable to the server leads to server/micro-service errors and/or blocked threads has several implications on the usability of the application.

The OME community has worked on extensions using the OMERO model to annotate and manage archived data 1. This issue discusses a proposal in the core OMERO model and API to represent the archival status of OMERO 5 data.

Current status

OMERO model and database

In the current OMERO model, the Image table include a archived boolean column.
In the OMERO 4.x series, import involved a conversion of all imaging data read via Bio-Formats client-side into an internal representation stored under the Pixels directory of the binary repository. Early versions of OMERO 4 introduced the ability to also upload the original acquisition files as part of the import. These files, sometimes referred to as “OMERO 4 archived files”, would be stored as a collection of OriginalFile objects linked to the Pixels object via a PixelsOriginalFileMap. In addition to this map, the import client could set the archived flag at the Image level to indicate the presence of original files e.g. for download purposes. There is a direct relationship between the value of archived flag and the existence of a PixelsOriginalFileMap which was enforced via the OMERO 4 -> OMERO 5 SQL upgrade script 2.
In OMERO 5, import was fully redesigned to upload the acquisition files as OriginalFile and read these server-side via Bio-Formats server-side. Unlike in OMERO 4, these OriginalFile objects are encapsulated within a FilesetEntry and a collection of Filesetentry defies a Fileset which is associated to each Image. For multi-image filesets, since OMERO 5.1, an series attribute stores the index of the series that should be initialized with Bio-Formats. This work effectively made the existing archived flag obsolete in OMERO 5 since every imported image has associated original files in the server.

OMERO API

Several APIs expose the original files associated with an Image. The implementations of these APIs will typically include some logic to deal with OMERO 4 and OMERO 5 images and retrieve the original files either linked through the Fileset or the PixelsOriginalFileMap depending on the use case.
For OMERO.server, the three most relevnt commands are omero.cmd.fs.UsedFilesRequest, omero.cmd.fs.ManageImageBinaries and omero.cmd.DiskUsage. In practice the implementation of these commands ignores the Image.archived flag and use IQuery directly to retrieve the map between Pixels and OriginalFile 3 4 5.

In the Python gateway, the most relevant APIs are ImageWrapper.getArchivedFilesInfo, ImageWrapper.countArchivedFiles, Image.getArchivedFiles. Here again the Image.archived attribute is never used and the implementation query directly the PixelsOriginalFileMap 6 7.

The Java gateway also handle OMERO 4 / OMERO 5 images in the omero.gateway.facility.TransferFacilityHelper. Here the archived flag is first checked before querying for the PixelsOriginalFileMap 8

Proposal

A reasonable change would be to use the Image.archived boolean to identify OMERO 5 data i.e. Image associated with a Fileset where the underlying OriginalFile are not available to the server. As shown above, this flag is barely used in the existing public APIs and implementation and only in the context of OMERO 4 images. The only use cases that would be affected would be third-party extensions which would have been using this property.

The OMERO.server API accessing the pixel data, notably the PixelsService API should be reviewed and updated to query the archived status of OMERO 5 images. If archived, the API should return an appropriate response for the clients. This could possibly be an existing or new server exception. The gateways and micro-services should be updated to expect and handle the new PixelsService behavior and provide the appropriate response when dealing with archived data.

The graphical clients should be updated to display informative messages when trying to access archived data and expose the archived status of OMERO 5 images. Some discussion has already started in the context of the left-hand and right-hand panel of OMERO.web 9

Finally, there are two options to clarify the concept of archival in OMERO 4:

Option # 1: keep the existing semantics for the archived flag for OMERO 4 data

In that case, it would be useful to clarify whether reading the archived flag is mandatory or recommended for OMERO 4 data and update the APIs as necessary. Additionally, a database upgrade script should be able set the archived flag to True for OMERO 4 data where a PixelsOriginalFileMap exists.

Option # 2: remove the usage of the archived flag for OMERO 4 data

In that scenario, the PixelsOriginalFileMap should be used as the sole discriminator for original files associated with OMERO 4 images. A database upgrade script should update the archived flag to False for all OMERO 4 data. All APIs referring to OMERO 4 archived files should be reviewed and deprecated/updated accordingly

Footnotes

  1. https://downloads.openmicroscopy.org/presentations/2017/Users-Meeting/Lightning-Talks/Alex%20Herbert%20-%20Archiving%20images%20from%20OMERO%20to%20Arkivum%202017.pdf

  2. https://github.com/ome/openmicroscopy/blob/v5.6.11/sql/psql/OMERO5.0__0/OMERO4.4__0.sql#L604-L608

  3. https://github.com/ome/omero-blitz/blob/4c46e156bebff40c8618bacfc16bd2b605d33cfd/src/main/java/omero/cmd/fs/ManageImageBinariesI.java#L172-L187

  4. https://github.com/ome/omero-blitz/blob/v5.7.2/src/main/java/omero/cmd/fs/UsedFilesRequestI.java#L275-L279

  5. https://github.com/ome/omero-blitz/blob/v5.7.3/src/main/java/omero/cmd/graphs/DiskUsageI.java#L138-L139

  6. https://github.com/ome/omero-py/blob/v5.19.2/src/omero/gateway/__init__.py#L3184-L3207

  7. https://github.com/ome/omero-py/blob/v5.19.2/src/omero/gateway/__init__.py#L10269-L10300

  8. https://github.com/ome/omero-gateway-java/blob/v5.7.3/src/main/java/omero/gateway/facility/TransferFacilityHelper.java#L110-L122

  9. https://github.com/ome/omero-web/pull/555

@sbesson sbesson changed the title Update OMERO model and API to represent and handle archived data Update the OMERO model and API to represent and handle archived data Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant