-
Notifications
You must be signed in to change notification settings - Fork 8
Metazoan genomes
Suppose genome.list
is a list of Metazoan assembly ids.
Let genome/<asm>/<asm>.prot
be the protein FASTA files for each assembly <asm>
in this list of assemblies.
Download https://busco-data.ezlab.org/v5/data/lineages/metazoa_odb10.2021-02-24.tar.gz and unpack into the directory metazoa_odb10/
.
cat metazoa_odb10/hmms/* > hmms
$TT/genetics/hmmAddCutoff hmms metazoa_odb10/scores_cutoff GA hmm-univ.LIB
rm hmms
For each assembly <asm>
run
$TT/genetics/prots2hmm_univ.sh genome/<asm>/<asm> hmm-univ.LIB 1 <asm>.log
which will create files
genome/<asm>/<asm>.univ
genome/<asm>/<asm>.prot-univ
Remove the assemblies with the number of universal proteins (in file <asm>.prot-univ
) below 850 from genome.list
.
Get the standard script to compute dissimilarities for Metazoa:
$TT/phylogeny/distTree_inc_init_stnd.sh inc genome/Metazoa "" "" "" ""
Create the file with pairs of assemblies:
$TT/list2pairs genome.list > pairs
Compute the dissimilarities:
inc/pairs2dissim.sh pairs "" dissim log
(The file pairs
can be split into parts, inc/pairs2dissim.sh
can be run on each part separately, and then the dissim
files can be concatenated.)
Convert the dissimilarity file dissim
into the Data Master format:
$TT/dm/pairs2dm dissim 1 "cons" 6 -distance > data.dm
$TT/phylogeny/makeDistTree -threads 5 -data data -dissim_attr "cons" -variance linExp \
-optimize -subgraph_iter_max 10 -noqual -output_tree tree