[palmDB] Generate palmprint-qc
SQL tables for palmdb
v2
#153
Labels
enhancement
New feature or request
palmprint-qc
SQL tables for palmdb
v2
#153
Overview
Underlying the Open Virome is the use of
palmDB
(https://github.com/ababaian/palmdb) as a reference database. Each of thepalmprint
calls was performed by a tool; eitherpalmscan
v1 or v2 orpalm_annot
.In addition the sequence QC table for each sequence will be compared to contrast basic sequence statistics for each
palmprint
.Background / Context
A sub-set of palmDB calls currently are "False Positives"; for example defined as being a sequence which are
RdRp
RdRp
; but the selected palmprint is not a well-formedpalmprint
defined by Motif A, B, CFor a complete set of TP/FP categories, see: #154
Hypothesis
The simple histogram distribution of scores / statistical values will be significantly different (T-test) between
positive control
andnegative control
palmprint sequences.The difference in these values will define a "space" by which to interpret unknown sequences and define a False Positive Rate for each palmDB sequence based on it's scores.
Experiment
For each palmprint in
palmDB
v2 run and store the results in a defined (standardized) SQL tablepalmscan2
palm_annot
Controls
Positive Controls: Sub-set of ICTV-confirmed RdRp sequences; thus they are certainly "RdRp", and when placed in a Multiple Sequence Alignment (i.e. Wolf 18) the motif A, B, and C will be aligned to one another. This is limited in term of highly-divergent RdRp as they are less likely to be in the "reference" or "known" virome set, but it will provide a space in terms of algorihtm scores which are "very highly quality"
Negative Controls: In the course of experimental analysis; there have been many instances of FP hits coming up; either
Not-RdRp
orNot-Palmprint
; a seperate issue will be needed to aggregate as many possible examples of these sequences into one set.Expected Outcome
The results of
palmscan
andpalm_annot
analysis for eachpalmprint
in palmDB2 will be stored in a relational database hosted on thelogan
SQL server.Open Questions
No response
References
No response
The text was updated successfully, but these errors were encountered: