Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot find an open port. For manually specifying the port, see ?SnowParamUsing previously downloaded VCF. #113

Open
bschilder opened this issue Aug 6, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@bschilder
Copy link
Collaborator

bschilder commented Aug 6, 2022

1. Bug description

import_sumstats: Works fine for 100s of GWAS, then encounters this error and quickly iterates through all remaining GWAS ids without actually processing them (and, strangely, appends their log files to that of the one that first encountered the error!).

This takes a very long time to actually reproduce (multiple days of running continuously). And it's not like the GWAS that was being analyzed at the time of the error was particular large or anything ("only" 11M SNPs).

Possible explanations

  1. Multiple users on our private cloud are accidentally trying to use the same threads at the same time, and BiocParallel can't handle this gracefully?
  2. The virtual machine becomes temporarily disconnected from its dedicated resources. Perhaps a question for @eduff
  3. data.table is trying to run in parallel within each loop of read_vcf_parallel (which is also being run in parallel), causing a conflict with the same cores being requested for different tasks at once. Though I don't know why this wouldn't happen far earlier when processing 100s of GWAS.

read_vcf_parallel:

It seems to occur at read_vcf_parallel. This function seems to be rather finicky as it also doesn't like it when I specify >30 threads, though I suspect that's for a different reason (splitting a VCF across too many threads means that if some genome tiles are empty, the whole loop breaks, perhaps at the final re-merging step).

Related Issues

BiocParallel:

Also, not sure if I'm the only one, but BiocParallel can be a bit trickier to use successfully.

Console output

Using local VCF.
File already tabix-indexed.
Finding empty VCF columns based on first 10,000 rows.
Dropping 1 duplicate columns.
1 sample detected: ubm-a-129
Constructing ScanVcfParam object.
VCF contains: 11,734,353 variant(s) x 1 sample(s)
Reading VCF file: multi-threaded (30 threads)
failed to open the port 11221, trying a new port...
failed to open the port 11596, trying a new port...
failed to open the port 11982, trying a new port...
failed to open the port 11329, trying a new port...
failed to open the port 11700, trying a new port...
  cannot find an open port. For manually specifying the port, see ?SnowParamUsing previously downloaded VCF.
Formatted summary statistics will be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/ubm-a-81.tsv.gz
Log data to be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/logs
Saving output messages to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/logs/MungeSumstats_log_msg.txt
Any runtime errors will be saved to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-81/logs/MungeSumstats_log_output.txt
Messages will not be printed to terminal.
all connections are in useUsing previously downloaded VCF.
Formatted summary statistics will be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/ubm-a-93.tsv.gz
Log data to be saved to ==>  /shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/logs
Saving output messages to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/logs/MungeSumstats_log_msg.txt
Any runtime errors will be saved to:
/shared/bms20/projects/MAGMA_Files_Public/data/GWAS_sumstats/ubm-a-93/logs/MungeSumstats_log_output.txt
Messages will not be printed to terminal.
...
...
...

Full logs file:
ubm-a-129_log_msg.txt

Expected behaviour

Process all sumstats.

2. Reproducible example

Code

meta <- MungeSumstats::find_sumstats(subcategories = c("neurological","Immune","cardio"))

gwas_paths <- MungeSumstats::import_sumstats(
  ids = meta$id[1:400], 
  save_dir = here::here("data/GWAS_sumstats"), 
  nThread = 30, # >30 causes issues with read_vcf_parallel
  parallel_across_ids = FALSE, 
  force_new_vcf = FALSE,
  force_new = FALSE,
  vcf_download = TRUE,
  vcf_dir = here::here("data/VCFs"),
  ### axel will keep trying forever if the URL doesn't exist (or is private)
  # download_method = "axel",
  #### Record logs
  log_folder_ind = TRUE,
  log_mungesumstats_msgs = TRUE,
  ) 

3. Session info

R Under development (unstable) (2022-02-25 r81808)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomeInfoDb_1.33.3    IRanges_2.31.0         S4Vectors_0.35.1       BiocGenerics_0.43.1   
[5] dplyr_1.0.9            ggplot2_3.3.6          data.table_1.14.2      MungeSumstats_1.5.5   
[9] MAGMA.Celltyping_2.0.6

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                                  R.utils_2.12.0                             
  [3] tidyselect_1.1.2                            lme4_1.1-30                                
  [5] RSQLite_2.2.15                              AnnotationDbi_1.59.1                       
  [7] htmlwidgets_1.5.4                           grid_4.2.0                                 
  [9] BiocParallel_1.31.10                        munsell_0.5.0                              
 [11] codetools_0.2-18                            withr_2.5.0                                
 [13] colorspace_2.0-3                            Biobase_2.57.1                             
 [15] filelock_1.0.2                              knitr_1.39                                 
 [17] rstudioapi_0.13                             orthogene_1.3.1                            
 [19] SingleCellExperiment_1.19.0                 ggsignif_0.6.3                             
 [21] MatrixGenerics_1.9.1                        GenomeInfoDbData_1.2.8                     
 [23] bit64_4.0.5                                 rprojroot_2.0.3                            
 [25] vctrs_0.4.1                                 treeio_1.21.0                              
 [27] generics_0.1.3                              xfun_0.31                                  
 [29] BiocFileCache_2.5.0                         R6_2.5.1                                   
 [31] bitops_1.0-7                                cachem_1.0.6                               
 [33] gridGraphics_0.5-1                          DelayedArray_0.23.1                        
 [35] assertthat_0.2.1                            BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1
 [37] promises_1.2.0.1                            BiocIO_1.7.1                               
 [39] scales_1.2.0                                gtable_0.3.0                               
 [41] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22    SNPlocs.Hsapiens.dbSNP155.GRCh38_0.99.22   
 [43] rlang_1.0.4                                 splines_4.2.0                              
 [45] rtracklayer_1.57.0                          rstatix_0.7.0                              
 [47] lazyeval_0.2.2                              gargle_1.2.0                               
 [49] broom_1.0.0                                 BiocManager_1.30.18                        
 [51] yaml_2.3.5                                  reshape2_1.4.4                             
 [53] abind_1.4-5                                 GenomicFeatures_1.49.5                     
 [55] backports_1.4.1                             httpuv_1.6.5                               
 [57] tools_4.2.0                                 ggplotify_0.1.0                            
 [59] ellipsis_0.3.2                              ggdendro_0.1.23                            
 [61] Rcpp_1.0.9                                  plyr_1.8.7                                 
 [63] progress_1.2.2                              zlibbioc_1.43.0                            
 [65] purrr_0.3.4                                 RCurl_1.98-1.8                             
 [67] prettyunits_1.1.1                           ggpubr_0.4.0                               
 [69] GenomicFiles_1.33.1                         BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000     
 [71] SummarizedExperiment_1.27.1                 fs_1.5.2                                   
 [73] here_1.0.1                                  magrittr_2.0.3                             
 [75] matrixStats_0.62.0                          hms_1.1.1                                  
 [77] patchwork_1.1.1                             mime_0.12                                  
 [79] evaluate_0.15                               xtable_1.8-4                               
 [81] XML_3.99-0.10                               EWCE_1.5.5                                 
 [83] gridExtra_2.3                               compiler_4.2.0                             
 [85] biomaRt_2.53.2                              tibble_3.1.8                               
 [87] crayon_1.5.1                                minqa_1.2.4                                
 [89] R.oo_1.25.0                                 htmltools_0.5.3                            
 [91] ggfun_0.0.6                                 later_1.3.0                                
 [93] tidyr_1.2.0                                 aplot_0.1.6                                
 [95] DBI_1.1.3                                   ExperimentHub_2.5.0                        
 [97] gprofiler2_0.2.1                            dbplyr_2.2.1                               
 [99] MASS_7.3-58                                 rappdirs_0.3.3                             
[101] boot_1.3-28                                 babelgene_22.3                             
[103] Matrix_1.4-1                                car_3.1-0                                  
[105] cli_3.3.0                                   R.methodsS3_1.8.2                          
[107] parallel_4.2.0                              SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20   
[109] GenomicRanges_1.49.0                        pkgconfig_2.0.3                            
[111] SNPlocs.Hsapiens.dbSNP144.GRCh38_0.99.20    GenomicAlignments_1.33.1                   
[113] plotly_4.10.0                               xml2_1.3.3                                 
[115] ggtree_3.5.1                                XVector_0.37.0                             
[117] yulab.utils_0.0.5                           stringr_1.4.0                              
[119] VariantAnnotation_1.43.2                    digest_0.6.29                              
[121] Biostrings_2.65.1                           rmarkdown_2.14                             
[123] HGNChelper_0.8.1                            tidytree_0.3.9                             
[125] restfulr_0.0.15                             curl_4.3.2                                 
[127] shiny_1.7.2                                 Rsamtools_2.13.3                           
[129] rjson_0.2.21                                nloptr_2.0.3                               
[131] lifecycle_1.0.1                             nlme_3.1-158                               
[133] jsonlite_1.8.0                              carData_3.0-5                              
[135] viridisLite_0.4.0                           limma_3.53.5                               
[137] BSgenome_1.65.2                             fansi_1.0.3                                
[139] pillar_1.8.0                                lattice_0.20-45                            
[141] homologene_1.4.68.19.3.27                   KEGGREST_1.37.3                            
[143] fastmap_1.1.0                               httr_1.4.3                                 
[145] googleAuthR_2.0.0                           interactiveDisplayBase_1.35.0              
[147] glue_1.6.2                                  RNOmni_1.0.0                               
[149] png_0.1-7                                   ewceData_1.5.0                             
[151] BiocVersion_3.16.0                          bit_4.0.4                                  
[153] stringi_1.7.8                               blob_1.2.3                                 
[155] AnnotationHub_3.5.0                         memoise_2.0.1                              
[157] ape_5.6-2  

@bschilder bschilder added the bug Something isn't working label Aug 6, 2022
@bschilder bschilder self-assigned this Aug 6, 2022
@bschilder
Copy link
Collaborator Author

I've documented some of my observations here as well:
neurogenomics/MAGMA_Celltyping#110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant