Skip to content
This repository has been archived by the owner on May 15, 2020. It is now read-only.
Niema Moshiri edited this page May 12, 2017 · 5 revisions

Canvas SPW command line

An example usage line for Canvas SPW looks like

Canvas SmallPedigree-WGS Options:

   -b, --bam=VALUE           sample .bam file. Option can be specified
                             multiple times. (required)
  --ploidy-vcf=VALUE         multisample .vcf file containing regions of
                             known ploidy. Copy number calls matching the
                             known ploidy in these regions will be considered
                             non-variant
  --sample-b-allele-vcf=VALUE
                             multisample .vcf file containing SNV b-allele
                             sites (only sites with PASS in the filter column
                             will be used) (required) (either this option or
                             option population-b-allele-vcf is required)
  --population-b-allele-vcf=VALUE
                             vcf containing SNV b-allele sites in the
                             population (only sites with PASS in the filter
                             column will be used) (required) (either this
                             option or option sample-b-allele-vcf is required)
  --common-cnvs-bed=VALUE
                             .bed file containing regions of known common CNVs
  --proband=VALUE            Proband sample name. Option can be specified
                             multiple times.
  --mother=VALUE             Mother sample name
  --father=VALUE             Father sample name
  -o, --output=VALUE         output directory (required)
  -r, --reference=VALUE      Canvas-ready reference fasta file (required)
  -g, --genome-folder=VALUE  folder that contains both genome.fa and
                             GenomeSize.xml (required)
  -f, --filter-bed=VALUE     .bed file of regions to skip (required)
  --custom-parameters=VALUE

The multi-sample ploidy.vcf lists ploidy for each region and sample with an example provided below; sample names should be the same as the ones in bam headers (SM tag).

##fileformat=VCFv4.1
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA12882 NA12878 NA12877
chrX    0       .       N       <CNV>   .       PASS    END=10001       CN      1       2       1
chrX    2781479 .       N       <CNV>   .       PASS    END=155701383   CN      1       2       1
chrX    156030895       .       N       <CNV>   .       PASS    END=156040895   CN      1       2       1
chrY    0       .       N       <CNV>   .       PASS    END=57227415    CN      1       0       1

Sample names and de novo variant calling

Canvas sample names are derived from bam header SM tags. Therefore, sample name headers in multisample SNV vcf, ploidy vcf and pedigree tags (proband, father, mother) should be identical to ones present in bam headers. If proband tag in not used, no de novo variant calling and reporting will be done.

Output

The primary output from Canvas SPW are multi-sample and single-sample .vcf file compliant with the version 4.1.

Multi-sample VCF

The copy number (CN) and major chromosome count (MCC) are indicated in the per-sample data along with variant q-score (QS). De novo calls also have a DQ score in a format field that represents a de novo Phred-scaled quality score. An example lines representing de novo variant calls with PASS filter and dq20 flag:

12 29851548 Canvas:LOSS:12:29851549-29853545 N <CNV> 19.35 PASS SVTYPE=CNV;dq20;END=29853545;CNVLEN=1997;CIPOS=-466,458;CIEND=-540,458 RC:BC:CN:MCC:QS:DQ 86:2:1:0:23.33:12.12

To extract variants with specific DQ scores, bcftools could be used

bcftools view -s child CanvasOutdir/CNV.vcf.gz | bcftools filter -i FORMAT/DQ>20

For multi-sample .vcf file the following rules are applied to assign PASS filter flag:

  • PASS variant if proband is PASS (with pedigree information)
  • If at least once sample has PASS (with no pedigree information)
  • Both parents and a proband are PASS (for de novo variants)

The following variant tags are assigned to each call:

  • GAIN - if at least one sample has a GAIN call
  • LOSS - if at least one sample has a LOSS call
  • LOH - if at least one sample has a LOH call
  • ComplexCNV - more than one non-REF variant tag (i.e. LOH and GAIN)

Single-sample VCF

These VCF files (under TempCNV_* subfolder) have a slightly different format: DQ scores arewritten to the INFO column and addition dq20 filter flag (INFO column) for variants with de novo quality score above 20 is also provided. A general q-score is written to the QUAL column. Here's an example line in the output

chrX 140205371 Canvas:REF:chrX:140205371-140208082 N . 7.53 PASS DQ=31.0549859513643;dq20;END=140208082;CIPOS=-221,221;CIEND=-291,221 RC:BC:CN 56:5:1

Clone this wiki locally