-
Notifications
You must be signed in to change notification settings - Fork 3
Prepare input files
Georgios Koutsovoulos edited this page Jul 22, 2024
·
25 revisions
- [Optional] If you obtained the annotation with MetaEuk you need to rename your output files with
metaeuk_rename.py
Decide on the taxonomic Ingroup and the taxonomic groups to exclude (EGP). For example if we want to find non Metazoan origin of proteins in plant parasitic nematodes of the genus Meloidogyne (suborder Tylenchina taxid=6300) we set Ingroup to 33208 (Metazoa) and EGP to 6300.
- Similarity file
- Using NR
blastp -query [proteins.fa] -db nr -outfmt '6 std staxids' -seg no -evalue 1e-5 -out [similarity.out]
- Using other databases
diamond blastp -q [proteins.fa] -d [db.fasta.dmnd] --evalue 1e-5 --max-target-seqs 500 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids --out [similarity.out]
- Create
groups.yaml
file (a sample file can be found underAvP/depot/
)
- See groups.yaml
- AI features file (2 choises)
- If you want to use the AHS metric or hgt_local_score or both, use
calculate_ai.py
. It will create a file called*_ai.out
(described below)
calculate_ai.py -i [similarity.out] -x groups.yaml
-
else, use Alienness webserver (the file needed is called(ongoing)*_Alieness_FEATURES.xls
) (only works for NR)
- Create
config.yaml
file (a sample file can be found underAvP/depot/
)
- See config.yaml
calculate_ai.py
produces the tab-delimited file *_ai.out
.
Column | Description | Article |
---|---|---|
1 | Gene Name | |
2 | Best Donor String | |
3 | Best Ingroup String | |
4 | Alien Index (AI) | Gladyshev et al., 2008 |
5 | HGT index | Boschetti et al., 2012 |
6 | Number of blast hits | |
7 | AHS score | Koutsovoulos et al., 2022 |
8 | outg_pct | Li et al., 2022 |
Columns 3 and 4 contain a string with multiple information for the best donor and ingroup hit delimited by :
Gene_hit_name:Position_in_blast_list:Identity:E_value:Bitscore