Skip to content
Emily Jane McTavish edited this page Sep 10, 2018 · 4 revisions

Notes on internal data structures:

OTU_dict:

Type: Dictionary
Keys: otuids
OTU ids are unique internal identifiers. If tree is from OpenTree, otu id's will be reused from Open Tree, in the format otu123
New tips are minted new otu identifiers, in the format otuPS123
Each OTU has a dictionary:
Required keys - every entry in the otu dict should have these keys:
'^physcraper:status': either 'original', if read in from original tree, 'query' if a new tip, or several other possible messages for deleted tips.
'^physcraper:last_blasted': "1900/01/01" date sequence was last blasted - if never previously blasted, defaults to "1900/01/01"
Optional keys - generated by physcraper, read in from Open Tree, or pulled down with blast results:
'^ncbi:gi' = gi number of sequence from ncbi
'^ncbi:accession' = accession number of sequence from ncbi
'^ncbi:title' = title of blast record form ncbi
'^ncbi:taxon': ncbi taxon id number
'^ot:ottTaxonName': Human readable string fro taxon from Open Tree taxonomy
'^ot:ottId': OpenTree taxonomy Id number
'^ot:originalLabel': Original string read in as tip label
'^user:TaxonName': Taxon name as read in from csv input

Id dictionaries. Contained in the class IdDicts

ott_to_ncbi = {Open Tree Taxon id:Ncbi taxon id}
ncbi_to_ott = {Ncbi taxon id:Open Tree Taxon id}
ott_to_name = {Open Tree Taxon id:Open Tree human readable name}
gi_ncbi_dict = {gi number:ncbi taxon id} only generated for sequences that have been found in blast results.

Clone this wiki locally