sv-callers Installation in local machine and input files #47

nitha26 · 2020-10-22T12:05:56Z

Hi,

I would like to use sv-callers for calling germlineSVs from WGS. I have following doubts:

Is this tool can be installed in local centos7 machine? And whether the sv callers like manta, delly, lumpy and GRIDSS and other tools bcftools, survivor should be installed separately or it is the part of this repository (sub-modules already build in sv-callers)?
The version of manta is 1.1.0, whether the current version of sv-callers supports updated version of manta or GRIDSS?
Second thing is whether "cram" can be used as a input?
Samples are aligned to GRCh38 reference, does sv-callers provide excluded regions in .bed file of reference genome?

Sorry for all the questions, let me know if there is a better place to ask them, person to email, etc.

Thanks in advance!
Nitha

arnikz · 2020-10-22T13:25:00Z

Hi,

I would like to use sv-callers for calling germlineSVs from WGS. I have following doubts:

First, have you tried to run it locally?

1. Is this tool can be installed in local centos7 machine? And whether the sv callers like manta, delly, lumpy and GRIDSS  and other tools bcftools, survivor should be installed separately or it is the part of this repository (sub-modules already build in sv-callers)?

The workflow takes care of the dependencies including SV callers etc. via (bio)conda.

2. The version of  manta is 1.1.0, whether the current version of sv-callers supports updated version of  manta or GRIDSS?

In principle, yes (see here) but the unit/CI tests run with the aforementioned (older) software versions (see #35).

3. Second thing is whether "cram" can be used as a input?

Currently, there is no support for CRAM (sorry, we've been working with BAMs only).

4. Samples are aligned to GRCh38 reference, does sv-callers provide excluded regions in .bed file of reference genome?

You can configure a.o. things here

sv-callers/snakemake/analysis.yaml

Line 18 in 32dca9e

exclusion_list: data/ENCFF001TDO.bed

Cheers,
Arnold

nitha26 · 2020-10-23T05:44:48Z

Thank you for the reply. I will try to install sv-callers in local machine.

nitha26 · 2020-10-23T08:07:44Z

First, have you tried to run it locally?
Yes I had installed in Centos7 machine and was able to run the "execution of SV callers by writing (dummy) VCF files" command using example "data". (Attached log file "Trial_log_Exampledata.txt")

But I noticed the following lines in the output vcf files. Further how to execute these sv tools such as manta,delly,lumpy and gridss by using sv-caller in order to get structural variants genotype results (germline.vcf) in our local machine. Could you please direct me the command documentation.

all.vcf
data/bam/3/T3--N3/manta_out/survivor/manta.vcf data/bam/3/T3--N3/delly_out/survivor/delly.vcf data/bam/3/T3--N3/lumpy_out/survivor/lumpy.vcf data/bam/3/T3--N3/gridss_out/survivor/gridss.vcf

delly.vcf
data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam [data/bam/3/N3.bam.bai

Trial_log_Exampledata.txt

Thanks.

arnikz · 2020-10-23T08:21:04Z

Yep, that's correct

# 'vanilla' run (default) mimics the execution of SV callers by writing (dummy) VCF files
snakemake -C echo_run=1

Now for real, remove the data/bam/3/T3--N3 dir and run the workflow again with echo_run=0 etc. Please, README or see this command 😉

nitha26 · 2020-10-23T09:17:02Z

As per your suggestion we had tried the command snakemake -C echo_run=0
we are getting error. attached log file.
2020-10-23T143034.916521.snakemake.log

I had some doubts:

As asked in earlier query, "whether the sv callers like manta, delly, lumpy and GRIDSS and other tools bcftools, survivor should be installed separately or it is the part of this repository (sub-modules already build in sv-callers)"?

In this system we had already installed "delly" so I think sv-callers is able invoke only delly., but the other tools are not executing because the other tools are not installed.

arnikz · 2020-10-23T11:12:22Z

Please, read the README carefully. You must not install the callers and processing tools yourself; it's taken care of by the workflow if you add the missing --use-conda arg. In addition, take a closer look at the aforementioned Travis CI log (green badge), which exemplifies that all runs fine (no errors) in the automated deployment of the workflow with test data.

nitha26 · 2020-10-23T12:19:25Z

Right now we are working in the local machine Centos7, so we tried the bellow command and it is running for more than 45min but still showing same message. Could you please point out whether this command is correct, if so how long it takes to run.

(wf) [root@localhost snakemake]# snakemake -C echo_run=0 mode=p enable_callers="['manta','delly','lumpy','gridss']" --use-conda

Building DAG of jobs... Removing incomplete Conda environment environment.yaml... Creating conda environment environment.yaml... Downloading and installing remote packages.

Thanks for your support.

arnikz · 2020-10-23T12:56:41Z

The command looks fine. Yeah, conda install use to take a few minutes but these days it's very slow indeed - something to consider for the next release (#49) - but it needs to be done just once before the actual workflow run(s). What's your conda --version? Btw. why are you executing the wf as root?

nitha26 · 2020-10-23T13:19:51Z

I am running from root user. The conda version is

[root@localhost]# conda --version
conda 4.8.3

And the conda package install is STILL in same stage.
Building DAG of jobs... Removing incomplete Conda environment environment.yaml... Creating conda environment environment.yaml... Downloading and installing remote packages.

arnikz · 2020-10-23T13:37:58Z

I am running from root user.

Yes, that's clear but it's not necessary (and could be dangerous).

The conda version is

[root@localhost]# conda --version
conda 4.8.3

Update to the latest version via conda update -y conda once this one below is finished

And the conda package install is STILL in same stage.
Building DAG of jobs... Removing incomplete Conda environment environment.yaml... Creating conda environment environment.yaml... Downloading and installing remote packages.

Sorry, I can't help you with that (e.g. waiting, Internet bandwidth etc.)

nitha26 · 2020-10-23T14:14:43Z

Yes, that's clear but it's not necessary (and could be dangerous).
Got it. Thank you.

Sorry, I can't help you with that (e.g. waiting, Internet bandwidth etc.)
I understand.

the command had run but I wonder why in result data not able to find any SV call information. pasting the ``all.vcf``` information. (sorry I could not drop the all.vcf file)

##fileformat=VCFv4.1 ##source=SURVIVOR ##fileDate=20201023 ##contig=<ID=chr22,length=51304566> ##ALT=<ID=DEL,Description="Deletion"> ##ALT=<ID=DUP,Description="Duplication"> ##ALT=<ID=INV,Description="Inversion"> ##ALT=<ID=BND,Description="Translocation"> ##ALT=<ID=INS,Description="Insertion"> ##INFO=<ID=CIEND,Number=2,Type=String,Description="PE confidence interval around END"> ##INFO=<ID=CIPOS,Number=2,Type=String,Description="PE confidence interval around POS"> ##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation"> ##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant"> ##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends"> ##INFO=<ID=RE,Number=1,Type=Integer,Description="read support"> ##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation"> ##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation"> ##INFO=<ID=SVLEN,Number=1,Type=Float,Description="Length of the SV"> ##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Method for generating this merged VCF file."> ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of the SV."> ##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="Vector of supporting samples."> ##INFO=<ID=SUPP,Number=1,Type=String,Description="Number of samples supporting the variant"> ##INFO=<ID=STRANDS,Number=1,Type=String,Description="Indicating the direction of the reads with respect to the type and breakpoint."> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=PSV,Number=1,Type=String,Description="Previous support vector"> ##FORMAT=<ID=LN,Number=1,Type=Integer,Description="predicted length"> ##FORMAT=<ID=DR,Number=2,Type=Integer,Description="# supporting reference,variant reads in that order"> ##FORMAT=<ID=ST,Number=1,Type=String,Description="Strand of SVs"> ##FORMAT=<ID=QV,Number=1,Type=String,Description="Quality values: if not defined a . otherwise the reported value."> ##FORMAT=<ID=TY,Number=1,Type=String,Description="Types"> ##FORMAT=<ID=ID,Number=1,Type=String,Description="Variant ID from input."> ##FORMAT=<ID=RAL,Number=1,Type=String,Description="Reference allele sequence reported from input."> ##FORMAT=<ID=AAL,Number=1,Type=String,Description="Alternative allele sequence reported from input."> ##FORMAT=<ID=CO,Number=1,Type=String,Description="Coordinates"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878_2 NA12878 NA12878_1 N3.bam

Could you please guide me which is the complete result file, can be used for downstream analysis?
Right now I have more than 500 WGS samples, what is maximum samples can be used for running sv-caller? What is the time calculation for running a complete analysis for 10 samples in a normal machine?
To speed up the processes, if I split 500 samples in batch (if I do in HPC cluster slurm), and in which stage I can combine all the vcf file for population study? Please give an outline how this multiple samples can be run in batches and merging takes places in sv-caller. Whether any chnages have to do in "sample.csv"?
how to validate or to confirm whether all my jobs are completed successfully and time logs?

Thank you so much.

arnikz · 2020-10-26T17:27:21Z

the command had run but I wonder why in result data not able to find any SV call information. pasting the ``all.vcf``` information. (sorry I could not drop the all.vcf file)

That's correct. The sample data are meant for CI testing only (T3/N3.bam files are identical and refer to a small part of the genome). The all.vcf file is the result of SURVIVOR merge (final wf step) of all the SV callers' VCF files. For more details, refer to our paper.

Could you please guide me which is the complete result file, can be used for downstream analysis?

You could use VCF files of each caller in the corresponding dir or the aforementioned (merged) VCF.

Right now I have more than 500 WGS samples, what is maximum samples can be used for running sv-caller?

In principle, there is no limit on the number of samples in samples.csv you could analyze. It depends on the compute/storage resources available to you on a HPC system.

What is the time calculation for running a complete analysis for 10 samples in a normal machine?

It depends on your samples and machine. See our paper for example runs (germline and somatic).

To speed up the processes, if I split 500 samples in batch (if I do in HPC cluster slurm), and in which stage I can combine all the vcf file for population study? Please give an outline how this multiple samples can be run in batches and merging takes places in sv-caller. Whether any chnages have to do in "sample.csv"?

The workflow takes care of the parallelization so there is no need to split/merge jobs yourself.

how to validate or to confirm whether all my jobs are completed successfully and time logs?how to validate or to confirm whether all my jobs are completed successfully and time logs?

See snakemake...&>smk.log workflow log and/or stderr-[jobid].log per job. In addition, you can retrieve detailed job accounting info from the HPC system used (see README.md).

arnikz · 2020-10-26T17:30:45Z

And the conda package install is STILL in same stage.

This was fixed in the v1.1.2 (#49)

nitha26 · 2020-10-26T17:53:27Z

Yesterday when i tried to fired the job, the conda package was installed within a miniutue. Thanks

arnikz added the question label Oct 22, 2020

arnikz closed this as completed Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sv-callers Installation in local machine and input files #47

sv-callers Installation in local machine and input files #47

nitha26 commented Oct 22, 2020

arnikz commented Oct 22, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020

nitha26 commented Oct 23, 2020 •

edited

Loading

arnikz commented Oct 23, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020

arnikz commented Oct 23, 2020

nitha26 commented Oct 23, 2020 •

edited

Loading

arnikz commented Oct 23, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020 •

edited

Loading

arnikz commented Oct 23, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020 •

edited

Loading

arnikz commented Oct 26, 2020

arnikz commented Oct 26, 2020

nitha26 commented Oct 26, 2020 •

edited by arnikz

Loading

sv-callers Installation in local machine and input files #47

sv-callers Installation in local machine and input files #47

Comments

nitha26 commented Oct 22, 2020

arnikz commented Oct 22, 2020 • edited Loading

nitha26 commented Oct 23, 2020

nitha26 commented Oct 23, 2020 • edited Loading

arnikz commented Oct 23, 2020 • edited Loading

nitha26 commented Oct 23, 2020

arnikz commented Oct 23, 2020

nitha26 commented Oct 23, 2020 • edited Loading

arnikz commented Oct 23, 2020 • edited Loading

nitha26 commented Oct 23, 2020 • edited Loading

arnikz commented Oct 23, 2020 • edited Loading

nitha26 commented Oct 23, 2020 • edited Loading

arnikz commented Oct 26, 2020

arnikz commented Oct 26, 2020

nitha26 commented Oct 26, 2020 • edited by arnikz Loading

arnikz commented Oct 22, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020 •

edited

Loading

arnikz commented Oct 23, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020 •

edited

Loading

arnikz commented Oct 23, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020 •

edited

Loading

arnikz commented Oct 23, 2020 •

edited

Loading

nitha26 commented Oct 23, 2020 •

edited

Loading

nitha26 commented Oct 26, 2020 •

edited by arnikz

Loading