Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using on-disk array library for @.Data slot? #58

Open
yeyuan98 opened this issue Feb 2, 2022 · 0 comments
Open

Using on-disk array library for @.Data slot? #58

yeyuan98 opened this issue Feb 2, 2022 · 0 comments

Comments

@yeyuan98
Copy link

yeyuan98 commented Feb 2, 2022

Hi,

Thanks a lot for this great package. I noticed that most of the memory footprint comes from the data slot which is a base R array.

Every time we do an image operation directly on Image or AnnotatedImage will lead to a full copy of the data and doubles memory usage. For example, let's say we have an Image object img. Then, operations like img * 50 will lead to double of memory usage (even without assign the result to anything!)

Also interestingly, display function will give even more memory usage than double. For example, operations like display(img) will typically give around 200% memory bump for me.

I am assuming that this behavior is caused by the usage of base R array, maybe somewhat mentioned in Issue#40 already.

Could we use on-disk array libraries to replace the current base R array?

For example, HDF5Array seems to be a reasonable replacement. I did a bit research and it seems that 1) it is well-supported by Bioconductor core team, 2) it allows easy conversion of base R array to an on-disk temporary HDF5 array (as simple as hdf5array <- HDF5Array(base.R.array), 3) it chunks the array into small pieces for fast access and only loads the relevant chunks into memory when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant