Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maxing out memory, crashing RStudio and Computer at Step 18 #682

Open
alexwskh opened this issue Dec 31, 2024 · 3 comments
Open

Maxing out memory, crashing RStudio and Computer at Step 18 #682

alexwskh opened this issue Dec 31, 2024 · 3 comments

Comments

@alexwskh
Copy link

Hello to anyone still minding this hub! I appreciate any help. I hope the developers of this tool find some more financial support in the future as well, as this is an awesome project.

My lab does tumor modeling in zebrafish. I have several different tumors from different fish and I am trying to infer any CNV variants across them. I have performed seurat integration (though I am using the raw counts for this) separately. I designated all cells with at least 1 transcript of GFP as my observation group, and everything else as control. I've tested on running one tumor separately and that does work, but I was hoping to pool them. The combined set isn't that large at 25k cells.

I keep crashing at step 18. I can see my system resources maxing out (64 gigs of ram), followed by the swap mem. It'll then crash. Is there anyway around this? Will it work if I run on a HPC node with more memory? Despite dropping the leiden_resolution parameter quite a bit, it looks like its still trying to make a ton of subclusters... I'm just looking for broad and obvious changes.

infercnv_subclusters

#gfp reference by pooled sample
infercnv_obj = CreateInfercnvObject(
  raw_counts_matrix=all.counts,
  annotations_file=cell.idents,
  delim="\t",
  gene_order_file=geneorderfile,
      ref_group_names=c("bard1_count_GFP_NEG","brca2july2024_count_GFP_NEG", "brca2older_count_GFP_NEG","ddr_wt_count_GFP_NEG","palb2_count_GFP_NEG"))

infercnv_obj_default = infercnv::run(
  infercnv_obj,
  cutoff=0.1, # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
  out_dir=outdir,
  cluster_by_groups=TRUE, 
  plot_steps=FALSE,
  denoise=TRUE,
  HMM=TRUE,
  no_prelim_plot=TRUE,
  leiden_resolution = 0.01,
  num_threads = 12,
  png_res=180,
  debug = TRUE,
  BayesMaxPNormal = 0.2,
  per_chr_hmm_subclusters = FALSE,
    )
@alexwskh
Copy link
Author

alexwskh commented Jan 1, 2025

For anyone running into similar issues, I naively didn't realize that the memory usage scales with the number of threads dedicated to parallel processing of the set. Reducing num_threads (in my case leaving it at the default instead of trying 12) seems to allow it to run for individual samples at least.

@withermatt
Copy link

Hi @alexwskh,

I can run this on an HPC and just keep bumping up the memory allocation if it crashes. (Maybe try 250 GB for 25k cells?) I downloaded the singularity image and ran the R script exactly as it is provided on the installation page and it gave me no issues. Hope that helps!

@alexwskh
Copy link
Author

@withermatt Thanks for the reply,

Ok good to know that it can work if given enough memory. May revisit in the future then...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants