-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker wrapper #1
base: master
Are you sure you want to change the base?
Conversation
Building locally now. Once working, I'll get this on docker hub. |
Only one thread looks to be active:
|
cc: @jkh1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should add the following command at line 93 in compute_image_similarities.R in order to free up some memory:
rm(fields, group_name, h5data, measures, imageID, wellID, listOfFeatureMatrices)
Thanks, @beatrizserrano. Running again now. |
@beatrizserrano unfortunately
still failed with:
|
Hi,
There are various ways to deal with this, assuming this is caused by
having too many images:
- Apply PCA to a random subset of the images. As long as this subset is
representative of the data, the covariance will be reasonably well
approximated.
- Compute the covariance matrix incrementally, see
http://rebcabin.github.io/blog/2013/01/22/covariance-matrices/
- Use random SVD/PCA, see this paper:
https://arxiv.org/pdf/0909.4061.pdf and R package rsvd:
https://CRAN.R-project.org/package=rsvd
- Use random projections, see e.g.
http://users.ics.aalto.fi/ella/publications/randproj_kdd.pdf
Cheers
J-K
…On 21/06/17 15:10, Josh Moore wrote:
@beatrizserrano <https://github.com/beatrizserrano> unfortunately
***@***.*** serrano-remining]$ git diff diff --git
a/compute_image_similarities.R b/compute_image_similarities.R index
89535b8..73859f1 100644 --- a/compute_image_similarities.R +++
b/compute_image_similarities.R @@ -88,8 +88,9 @@ featureMatrix <-
aggregate(featureMatrix, by = list(row.names(featureMatrix)), m
rownames(featureMatrix) <- featureMatrix[,1] featureMatrix <-
featureMatrix[,-1] # Remove constant features featureMatrix <-
featureMatrix[, which(!apply(featureMatrix, 2, FUN=function(x)
{sd(x)==0}))] +rm(fields, group_name, h5data, measures, imageID,
wellID, listOfFeatureMatrices) # PCA pca <- prcomp(featureMatrix,
scale.= TRUE, center = TRUE) |
still failed with:
|... 49998 chebyshev_coefficients_49998 0.000000e+00 49999
chebyshev_coefficients_49999 0.000000e+00 [ reached
getOption("max.print") -- omitted 388322951 rows ] Warning message:
system call failed: Cannot allocate memory real 158m6.828s user
0m0.970s sys 0m0.323s |
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE_kIf3DwaUx0ViCxGLZtv8G2PPlV5pnks5sGRZLgaJpZM4NSvno>.
--
Dr Jean-Karim Hériché
Cell Biology and Biophysics Unit
European Molecular Biology Laboratory
Meyerhofstrasse 1
69117 Heidelberg
Germany
tel: +49 (0) 6221 387 8188
|
Thanks, @jkh1. Let's begin with the easiest one :) To select a random subset of images, we need to expand the line 90 to:
I've selected 1% of the images for testing purposes, but we could increase it as long as we see it's working. |
👍 |
@beatrizserrano / @jkh1 : any thoughts on what format to write the results out to? |
Flat files in a hierarchical directory structure matching that of the
images would work best for most purposes. If you want to keep everything
together, you could use HDF5 but there won't be any other benefit given
the expected access pattern, plus the hd5 1.8 library doesn't support
concurrent reads (although the upcoming hdf5 1.10 should enable it).
Alternatively, if it's just for us, serialized R data structures (i.e.
RDS files from the saveRDS() function) would be best.
J-K
…On 22/06/17 18:00, Josh Moore wrote:
@beatrizserrano <https://github.com/beatrizserrano> / @jkh1
<https://github.com/jkh1> : any thoughts on what format to write the
results out to?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE_kIezqid-bEk_1P8ylp292A0piXxkMks5sGo-sgaJpZM4NSvno>.
--
Dr Jean-Karim Hériché
Cell Biology and Biophysics Unit
European Molecular Biology Laboratory
Meyerhofstrasse 1
69117 Heidelberg
Germany
tel: +49 (0) 6221 387 8188
|
Does this exist as a function?
Of which variables? |
Any suggestions here, @jkh1 & @beatrizserrano ? |
Sorry for the delay in getting back to you.
I don't think there's currently a function that creates the hierarchical
file structure.
I would save the PCA-derived features (variable features) and the
similarity matrix (variable simMatrix).
Note that the code has to be extended to project the data not used in
the PCA onto the PCs with predict(pca, newdata = ...)|
|On 05/07/17 15:13, Josh Moore wrote:
…
Any suggestions here, @jkh1 <https://github.com/jkh1> &
@beatrizserrano <https://github.com/beatrizserrano> ?
—
--
Dr Jean-Karim Hériché
Cell Biology and Biophysics Unit
European Molecular Biology Laboratory
Meyerhofstrasse 1
69117 Heidelberg
Germany
tel: +49 (0) 6221 387 8188
|
Coding began at the January meeting in Dundee. Now with features generated for idr0013 and idr0012, it's time to try generating the similarity matrix.
cc: @jkh1 @dominikl @manics