-
Notifications
You must be signed in to change notification settings - Fork 20
Home
An example usage line for Canvas SPW looks like
Canvas SmallPedigree-WGS Options:
-b, --bam=VALUE sample .bam file. Option can be specified
multiple times. (required)
--ploidy-vcf=VALUE multisample .vcf file containing regions of
known ploidy. Copy number calls matching the
known ploidy in these regions will be considered
non-variant
--sample-b-allele-vcf=VALUE
multisample .vcf file containing SNV b-allele
sites (only sites with PASS in the filter column
will be used) (required) (either this option or
option population-b-allele-vcf is required)
--population-b-allele-vcf=VALUE
vcf containing SNV b-allele sites in the
population (only sites with PASS in the filter
column will be used) (required) (either this
option or option sample-b-allele-vcf is required)
--common-cnvs-bed=VALUE
.bed file containing regions of known common CNVs
--proband=VALUE Proband sample name. Option can be specified
multiple times.
--mother=VALUE Mother sample name
--father=VALUE Father sample name
-o, --output=VALUE output directory (required)
-r, --reference=VALUE Canvas-ready reference fasta file (required)
-g, --genome-folder=VALUE folder that contains both genome.fa and
GenomeSize.xml (required)
-f, --filter-bed=VALUE .bed file of regions to skip (required)
--custom-parameters=VALUE
The multi-sample ploidy.vcf lists ploidy for each region and sample with an example provided below; sample names should be the same as the ones in bam headers (SM tag).
##fileformat=VCFv4.1
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12882 NA12878 NA12877
chrX 0 . N <CNV> . PASS END=10001 CN 1 2 1
chrX 2781479 . N <CNV> . PASS END=155701383 CN 1 2 1
chrX 156030895 . N <CNV> . PASS END=156040895 CN 1 2 1
chrY 0 . N <CNV> . PASS END=57227415 CN 1 0 1
Canvas sample names are derived from bam header SM tags. Therefore, sample name headers in multisample SNV vcf, ploidy vcf and pedigree tags (proband, father, mother) should be identical to ones present in bam headers. If proband tag in not used, no de novo variant calling and reporting will be done.
The primary output from Canvas SPW are multi-sample and single-sample .vcf file compliant with the version 4.1.
The copy number (CN) and major chromosome count (MCC) are indicated in the per-sample data along with variant q-score (QS). De novo calls also have a DQ score in a format field that represents a de novo Phred-scaled quality score. An example lines representing de novo variant calls with PASS filter and dq20 flag:
12 29851548 Canvas:LOSS:12:29851549-29853545 N <CNV> 19.35 PASS SVTYPE=CNV;dq20;END=29853545;CNVLEN=1997;CIPOS=-466,458;CIEND=-540,458 RC:BC:CN:MCC:QS:DQ 86:2:1:0:23.33:12.12
To extract variants with specific DQ scores, bcftools could be used
bcftools view -s child CanvasOutdir/CNV.vcf.gz | bcftools filter -i FORMAT/DQ>20
For multi-sample .vcf file the following rules are applied to assign PASS filter flag:
- PASS variant if proband is PASS (with pedigree information)
- If at least once sample has PASS (with no pedigree information)
- Both parents and a proband are PASS (for de novo variants)
The following variant tags are assigned to each call:
- GAIN - if at least one sample has a GAIN call
- LOSS - if at least one sample has a LOSS call
- LOH - if at least one sample has a LOH call
- ComplexCNV - more than one non-REF variant tag (i.e. LOH and GAIN)
These VCF files (under TempCNV_* subfolder) have a slightly different format: DQ scores arewritten to the INFO column and addition dq20 filter flag (INFO column) for variants with de novo quality score above 20 is also provided. A general q-score is written to the QUAL column. Here's an example line in the output
chrX 140205371 Canvas:REF:chrX:140205371-140208082 N . 7.53 PASS DQ=31.0549859513643;dq20;END=140208082;CIPOS=-221,221;CIEND=-291,221 RC:BC:CN 56:5:1