Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conn.c.download behavior differs from java implementation #425

Open
Rdornier opened this issue Aug 19, 2024 · 5 comments · May be fixed by #427
Open

conn.c.download behavior differs from java implementation #425

Rdornier opened this issue Aug 19, 2024 · 5 comments · May be fixed by #427

Comments

@Rdornier
Copy link

Rdornier commented Aug 19, 2024

Hello,

I noticed that the behavior of the conn.c.download method is not the same as the one implemented on omero.inisght (and omero-java-gateway)
In the latest version of omero.inisght, when an image is downloaded, a parent folder Fileset_xxxx is created and the image(s) linked to the corresponding fileset are downloaded inside.

This is not the case in omero-py. The image is simply downloaded without creating any folder.

I was wondering if it is possible to mirror the behavior of the latest java implementatio into omero-py. It is quite annoying to deal with those two behaviors in automated tasks, espacially when I have to read the images froma a different script.

Thanks,
Rémy.

Edit : current existing bug in Java implementation for vsi files : ome/omero-gateway-java#89

@sbesson
Copy link
Member

sbesson commented Aug 20, 2024

@Rdornier you are right that the two implementations behave differently. Going further, I believe the contract of these APIs and their signature was never expected to be compared directly.

The Python omero.client.download API effectively mirrors the Java one at omero.client.download. In both cases, the behavior is to take an OriginalFile as an input and download it to a local file (or file handle). Both APIs have no support for sets of OriginalFile like Fileset. When using them, it is effectively the responsibility of the caller to do the looping and structure the download appropriately.

On the contrary, the omero.gateway.facility.TransferFacilityHelper.downloadImage API is operating on Image objects. For OMERO 5 data, it resolves the associate Fileset and will handle the download of the set of OriginalFile objects, preserving their internal relationships so that Bio-Formats can read them.
In OMERO.py, the closest existing implementation would be the omero download CLI plugin which takes an Image or a Fileset as an input and then calls omero.client.download to download individual files - see https://github.com/ome/omero-py/blob/master/src/omero/plugins/download.py.

@Rdornier
Copy link
Author

Hi @sbesson

Thanks for the clarification. I wasn't comparing the right APIs together.
Ok, so Python and Java "core" API are mirrored but not "high level" ones.

In OMERO.py, the closest existing implementation would be the omero download CLI plugin which takes an Image or a Fileset as an input and then calls omero.client.download to download individual files - see https://github.com/ome/omero-py/blob/master/src/omero/plugins/download.py.

Thanks for pointing this out !
Actually, this implementation works fine but is not really usable outside the CLI, although the download_fileset method is the one that mimic the omero.gateway.facility.TransferFacilityHelper.downloadImage behavior, minus the Fileset_xxx folder.

Do you think it would be possible to implement a method Download_image at the BlitzGateway level (or any other level) which mirrors omero.gateway.facility.TransferFacilityHelper.downloadImage behavior, with the Fileset_xxx folder and being usable without the CLI ?

I currently duplicate the code of the download_fileset method in my project, but it's not super elegent and I would prefer to use a built-in API method.

Rémy.

@will-moore
Copy link
Member

Hi @Rdornier - have a look at https://gist.github.com/will-moore/a9f90c97b5b6f1a0da277a5179d62c5a
That code iterates through Projects and Datasets and downloads Images to a new Folder per Dataset (or per Image - see comments).
The only thing you might need to address is if you've got multiple Images in the Dataset that come from the same Fileset then you'd download the same files multiple times.

@Rdornier
Copy link
Author

Hi @will-moore,

Thanks for pointing out this code ; it works pretty well !

The only thing you might need to address is if you've got multiple Images in the Dataset that come from the same Fileset then you'd download the same files multiple times.

Ok, I filtered the filesets that have already been downloaded to avoid multiple times download.

At the end, the only thing to add to the downoload_fileset() method, to match Java implementation, is dir_path= os.path.join(dir_path, "Fileset_%s" % fileset.id). As it is only one line of code, it could also be easily added by the user before calling downoload_fileset(), but I think it is still better if both implementation give the same results.

@will-moore
Copy link
Member

@Rdornier please feel free if you'd like to open a PR to propose the changes you'd like to see?

@Rdornier Rdornier linked a pull request Aug 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants