-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelisation issues with snp_ldsc and SLURM system #534
Comments
I would try to reinstall bigsnpr, it is odd that it doesn’t find the function. You can also skip using parallelism for this function as it is only used to get confidence intervals; use blocks = NULL and ncores = 1. |
Sorry, to be clear, at the very start of this script I do:
The reason I tried to add the Oh, so instead just do:
Would this be much quicker as well? And wouldn't impact calculating either of these?:
Also, thanks so much for responding while on vacation Florian, there's no huge rush on this so don't push yourself here! |
Do you need the Yes, it would be faster to avoid computing the CIs, which you don’t really need for running LDpred2. |
Yeah, that's just a local issue for our HPC where some common R packages like Ah! Okay, that's perfect news then! I'm giving this a try as we speak. That might solve the parallelisation issue and the speed issue simultaneously! |
bigsnpr is a CRAN package so it might be updated frequently on your cluster. Make sure to update it regularly. Yeah, but it is avoiding the issue, not really solving it. This might be an issue that you will have with other functions as well. From what I remember, it should take less than 5 min to run snp_ldsc with 200 blocks over 15 cores. |
I'll give it a try uninstalling and updating if this doesn't work! Wait, what?! As little as 5 minutes?? So, in theory even with only 1 core, surely it should take much less than 72 hours, right? I submitted jobs with this yesterday after we spoke:
This has now been running for >24 hours so far (on only 1 core). This is using the pre-computed HapMap+ SNPs and input files. Is this concerning? |
Yes, this is concerning. On my old laptop with 4 cores only, it takes
|
Can I ask how many SNPs this was? Was it the full set of 1.4M? |
Yes, 1.4M. |
Okay, I've done more tests of this now and think I made an error in previous messages. The parallelisation issue I mentioned at the start was a problem with snp_ldsc, but actually it seems like using only one core fixed that and in fact the function was working correctly. The problem now is in fact with |
Again, with 15 cores, LDpred2-auto should run in less than 12 hours. |
Hello, R support of the HPC cluster mentioned above here. We have been trying to get the bigsnpr package working, for example by trying different installation methods, different R versions and different Linux computers but no luck so far. We have been using this script for testing: https://privefl.github.io/bigsnpr/articles/LDpred2.html#computing-ldpred2-scores-genome-wide. So far, the only computer where we have been able to run the script without errors using multiple cores is my Mac laptop. Linux machines (our HPC cluster, a virtual machine, a laptop) give errors: Case 1: running on our cluster Puhti in its container-based R environment (R 4.4.0, https://docs.csc.fi/apps/r-env/) with multiple cores gives the errors of the type mentioned above (
SessionInfo:
Case 2: I set up a new Rocker-based R container (https://hub.docker.com/r/rocker/tidyverse) with the latest R and package versions on our cluster. This gives the following error for anything using
RhpcBLASctl is installed. I have tried several installation methods for bigsnpr ( SessionInfo:
We would be greatful for any tips on how to get the bigsnpr package working on our cluster! I'm happy to provide any additional information if needed. |
It sounds like the parallel R processes that are spawned do not used the same installation as the master R process.. What do you get for And for library(doParallel)
registerDoParallel(cl <- makeCluster(3))
foreach(i = 1:3) %dopar% { .libPaths() } |
Ahhh, I see the problem now - haven't come across a case like this before. I tried one more thing (that I obviously should have tried earlier): installing the package to our central package installation folder that is not available to users. And now bigsnpr works. For R package installations by users,
But if I run
So everything should be good now and the package should work for our users. Thank you for the help! |
But it means your users cannot properly access in parallel the packages they have installed themselves. I would encourage you to find a fix for that ;) |
Yes, we will definitely find a fix for this issue! I guess it hadn't come up for a while because many packages use future for parallelization and because of the large number of pre-installed packages we have. |
@Sabor117 If there is no more issue around this, please close this. |
Hi Florian,
This is the follow-up from my previous issue I mentioned. I will preface this saying that I'm not sure if this is something you will be able to help with or whether it is an issue with my specific HPC, but you may have seen similar issues and be able to help out.
Essentially, I am submitting LDpred jobs on a SLURM scheduler as follows:
(And just a quick note here, I have tried both
--cpus-per-task=6
and--ntasks=6
for this).When I am running LDpred (with the pre-computed LD reference panel) everything goes smoothly up until the
snp_ldsc
function:When I submit the jobs as described above, the script runs smoothly until it reaches this function and then it crashes with the following error:
Then when I change it to
ntasks=1
and removecpus-per-task
argument, it actually runs and doesn't throw the error. Essentially, it seems almost like most of the nodes aren't loading the function, which is causing it to crash. Despite it working correctly if there is just one node...HOWEVER, when I didn't use parallelisation it also ran for 3 days straight (max length of time for this sort of job) and then ran out of time.
Have you ever seen anything like this before? Is it something you think I might be able to solve? I am also in touch with IT though as this seems like an issue with the HPC rather than LDpred, but I thought I might ask here just in case.
The text was updated successfully, but these errors were encountered: