Results

ncov-tools generates a number of results to determine the validity of the run and the corresponding samples. These results are broken down to QC reports and plots to assist in visualizing tabular data.

Plots

The all_qc_sequencing workflow step produces the following plots:

_depth_by_position.pdf
_depth_by_position_negative_control.pdf
_amplicon_depth_by_ct.pdf *_amplicon_covered_fraction.pdf
_amplicon_coverage_heatmap.pdf

Each plot can provide QC details per sample.

_depth_by_position.pdf

This plot provides a per sample view of the genomic positions on the horizontal axis and a log10 scaled depth of coverage. All samples have been merged into a single document for ease of managing. If the Ct value is available, it will be listed in the title of the plot next to the sample name.

_depth_by_position_negative_control

The negative control samples have been extracted from the _depth_by_position.pdf for a quick review of the controls. Contamination may be a result if amplicons are expressed by negative controls.

_amplicon_depth_by_ct.pdf

The _amplicon_depth_by_ct.pdf plot shows a per amplicon tiled view to show the impact of cycle threshold on coverage. High Ct values (e.g. > 30) has shown lower coverages.

_amplicon_covered_fraction.pdf

This plots provides a per amplicon view of samples and the fraction of the amplicon covered.

_amplicon_coverage_heatmap.pdf

An overall view of the amplicon coverage is provided by this heatmap. The horizontal axis has the amplicon ID and the vertical axis provides all samples in a given run. The tile colours represents the log10 of the mean amplicon coverage for the amplicon.

Reports

At the completion of the all_qc_analysis step of the pipeline, reports are output in the qc_reports directory. These include:

_summary_qc.tsv
_ambiguous_position_report.tsv
_mixture_report.tsv
_negative_control_report.tsv

_summary_qc.tsv

The summary QC file provides an overview of genomic data across different files in aggregate. The file contains columns listed as:

sample:                     the name of  the sample
run_name:                   the run name the sample can be found in
num_consensus_snvs:         the number of variants found in the consensus file
num_consensus_n:            the number of Ns in the consensus file
num_consensus_iupac:        the number of IUPAC codes (not including N) in the consensus file
num_variants_snvs:          the number of variants found in the iVar variants/VCF file
num_variants_indel:         the number of indels found in the iVar variants/VCF file
num_variants_indel_triplet: number of indels from the iVar variants/VCF file with length as a multiple of 3
mean_sequencing_depth:      the mean sequencing depth across all the bases
median_sequencing_depth:    the median sequencing depth across all the bases
qpcr_ct:                    the qPCR cycle threshold as defined in the metadata.yaml
collection_date:            the collection date (yyyy-mm-dd) as defined in the metadata.yaml
num_weeks:                  the collection date as number of weeks since Jan 1, 2020
scaled_variants_snvs:       the number of variants taking genome completeness into consideration (num_variants_snvs / genome_completeness)
genome_completeness:        the fraction of the genome covered with respect to the reference sequence
qc_pass:                    the quality labels

The criteria set for the various classifications in qc_pass include:

INCOMPLETE_GENOME -- genome_completeness < 0.5
PARTIAL_GENOME -- genome_completeness < 0.9
POSSIBLE_FRAMESHIFT_INDELS -- num_variants_indel - num_variants_indel_triplet > 0
EXCESS_AMBIGUITY -- num_consensus_iupac > 5
EXCESS_VARIANTS -- scaled_variants_snvs > num_weeks * 0.75 + 15

If none of the above apply, the sample is classified as PASS.

_ambiguous_position_report.tsv

The ambiguous position report provides details on genomic positions and the number of samples with ambiguous bases called at a given position. The file is tab separated and shows three columns:

position: the genomic position of the base
count:    the number of occurrences of the allele
alleles:  the alleles found at this location

Multiple samples with ambiguous bases at the same position should be further investigated for possible contamination.

_mixture_report.tsv

The mixture report generates a list of possible contaminated samples from variants in the alleles.tsv file in conjunction with pileup files. The script used to generate the report identifies mutations recurrent in 2 or more samples with singletons discarded. The 8 column report provides the following:

sample_a:              the sample being analyzed
sample_b:              the sample being compared against sample_a
variants_checked:      the number of variants common between sample_a and sample_b being analyzed
variants_mixed:        the number of variants found to have potential mix with another sample
read_support_allele_a: the count of reads supporting for allele A
read_support_allele_b: the count of reads supporting allele B
mixture_fraction:
mixed_sites:           the position, the mutation, the mixture fraction as a comma separated list for multiple sites

Variants common among 2 or more samples are compared and the allelic fractions are determined.

_negative_control_report.tsv

The negative control report provides information on amplicons detected in samples where no viral template should be found. This is used to determine whether cross contamination may have occurred.

file:                    the amplicon_base_coverage.bed file for a given sample
qc:                      the classification, should be one of WARN, PASS
genome_covered_bases:    number of bases from amplicons found
genome_total_bases:      the total number of bases in the genome
genome_covered_fraction: the fraction of the genome found in the negative control
amplicons_detected:      the amplicon ID for the discovered amplicons

If negative control samples exhibit any amplicon, it should be inspected for contamination from a positive sample.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly