Skip to content
ewong347 edited this page Apr 30, 2020 · 51 revisions

Validation experiments:

Run Assay Type Release Date Published consensus Differences Acceptable?
SRR11177792 RNA-Seq 2020-02-25 MT072688 sam2conseq reports extra GGTTTATAC at 5' end, and extra AGTGC (run of ?) poly-A at 3' end - all else identical
SRR10903401 RNA-Seq 2020-01-18 MN988669.1 sam2conseq reports extra A at 5' end, and poly-A tail
SRR10903402 RNA-Seq 2020-01-18 MN988668.1 sam2conseq reports poly-A tail (21 nt + ?)
SRR10971381 RNA-Seq 2020-01-27 MN908947 published sequence has 7 nt reported as ? by sam2conseq (6193 A, 6592 A, 7030 T, 29864 G, 29867 T, 29868 G, 29870 C) and an extra 3' ?
SRR11241254 RNA-Seq 2020-03-04 MT163716.1 or EPI_ISL_413025 MT163716: 3 nt diff, poly-A tail absent (see #7); EPI_ISL_413025: same issues as MT163716, plus a large deletion Y
SRR11241255 RNA-Seq 2020-03-04 MT163717.1 3 nt differences (29861 A->T, 29862 A->G, 29864 A->C), extra ? on 3' end. Y
SRR11247075 RNA-Seq 2020-03-05 MT163721.1 Stretches of N’s and covid seqs between 0-578; Poly-A tail differences 29851 A->N 29852 A->N 29855 A->T 29856 A->G 29858 A->C; Generically low coverage throughout (n<50) except positions 29870-29903; otherwise same
SRR11247076 RNA-Seq 2020-03-05 MT163720.1 “TGAC” at position 29870-29874 present in sam2conseq and published sequence; published sequence missing nucleotides at positions 0-174; otherwise all same. Published consensus sequences for SRR11247075, SRR11247076 aren't reporting nucleotides with low (n<10) coverage
SRR11247077 RNA-Seq 2020-03-05 MT163719.1 Published consensus seq missing characters 0-578 Sam2conseq start of Poly-A-Tail has “GAATGA”, similar to reference (NC_045512) 29864 A->G 29867 A->T 29868 A->G 29870 A->C
SRR11247078 RNA-Seq 2020-03-05 MT163718.1 5’ end some differences; T -> A (1), T -> G (3), A -> G (4), T -> A (5), T -> A (6), S -> C (20); Depth <10, but no ambiguity. Grep for sequence preceding position 20 does show any ambiguity in raw fast file
SRR11278090 RNA-Seq 2020-03-09 WA13-UW9 Can't find the published consensus sequence
SRR11278091 RNA-Seq 2020-03-09 EPI_ISL_413563 Exact match except 3' end has extra "?" Y
SRR11278092 RNA-Seq 2020-03-09 WA11-UW7 Lots of sporadic "N's" in the sam2conseq. The highest frequency base for N's (in sam2conseq generated frequency csv) is the base called in the reported consensus sequence. Sam2conseq also reports more bases on the 5' end where the published consensus sequence has "?".
SRR11278164 RNA-Seq 2020-03-09 WA18-UW14 T->A (1), T->A (6); otherwise same
SRR11278165 RNA-Seq 2020-03-09 WA17-UW13 5' end discrepancies, same as SRR11278092; 29860 A->G, 29863 G->A, 29866 A -> T, 29867 A ->G, 29869 A->C
SRR11278166 RNA-Seq 2020-03-09 WA16-UW12 No differences
SRR11278167 RNA-Seq 2020-03-09 WA15-UW11 sam2conseq missing an A at position 1; reported gap
SRR11278168 RNA-Seq 2020-03-09 WA14-UW10 sam2conseq has 85 additional NT's at 5' end, more A's at 3' end. Differences at 3' end 29861 A->G, 29864 C->G, 29868 C->G
SRR11140744 WGS 2020-02-21 EPI_ISL_408670 (2019-nCoV/USA-WI1/2020) Conseq has longer polyA tail on 3' end
SRR11140746 WGS 2020-02-21 EPI_ISL_408670 (2019-nCoV/USA-WI1/2020) Published has extra 3 A's at 3' end
SRR11140748 WGS 2020-02-21 EPI_ISL_408670 (2019-nCoV/USA-WI1/2020) Same sample as above
SRR11140750 WGS 2020-02-21 EPI_ISL_408670 (2019-nCoV/USA-WI1/2020) Same sample as above
SRR11092057 RNA-Seq 2020-02-15 MN996528.1 failure to map, BLASTed sequences, found human/eukaryotic sequences
SRR11092058 RNA-Seq 2020-02-15 MN996527.1 mix of human and synthetic DNA, failure to map
SRR11092064 RNA-Seq 2020-02-15 MN996531.1 mix of human and synthetic DNA, failure to map
SRR11085733 RNA-Seq 2020-02-13 MN611525 non-human, failure to map
SRR11085736 RNA-Seq 2020-02-13 MN611522 non-human, failure to map
SRR11085737 RNA-Seq 2020-02-13 MN611521 non-human, failure to map
SRR11085738 RNA-Seq 2020-02-13 MN611520 non-human, failure to map
SRR11085740 RNA-Seq 2020-02-13 MN611518 non-human, failure to map
SRR11085741 RNA-Seq 2020-02-13 MN611517 non-human, failure to map
SRR11085797 RNA-Seq 2020-02-13 MN996532.1 non-human, failure to map
SRR11314339 RNA-Seq 2020-03-17 MT192765 3' tail has several has several N's
SRR11092056 RNA-Seq 2020-02-15 MN996530 Low coverage, sample is mostly human DNA (known issue) Y
Clone this wiki locally