This repository contains a Deep Learning model for identifying partial virus protein sequences in metagenomic data. In this repository are available the necessary data and the environment to run the query application.
This work was published in Briefings in Bioinformatics (2025): VirDetect-AI: a residual and convolutional neural network–based metagenomic tool for eukaryotic viral protein identification. Zárate A, Díaz-González L, Taboada B. https://doi.org/10.1093/bib/bbaf001
Download Extra suplementary data of VirDetect-AI https://zenodo.org/doi/10.5281/zenodo.13328820
There are two options to test the VirDetect-AI tool, through a google colab notebook or locally by installing a predefined environment
1.- Download the notebook Notebook_api_VirDetect-AI.ipynb located in the Notebook_VirDetect-AI folder in this repository.
2.- Execute the notebook Notebook_api_VirDetect-AI.ipynb on Google colab (GPU) or jupiter. Remember that the allowed format is only Fasta and the output is generated and saved in the outputs folder, which is a temporary folder in google drive [content], remember to download your results.
- Clone the repository to local (or download manually all repository)
git clone https://github.com/alyzart22/VirDetect-AI.git
-
Create enviroment
conda env create --file ./API_VirDetect-AI/enviroments/virdetect-ai_gpu.yml
Activate you enviroment
conda activate virdetect-ai_gpu
Execute this line in console
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
Execute this line to check that the gpu is working
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Output expected example:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
- Create enviroment
Activate you enviroment
conda env create --file ./VirDetect-AI/enviroments/virdetect-ai_cpu.yml
conda activate virdetect-ai_cpu
- Download the VirDetect-AI model.h5 from the following link and place it inside the /API_VirDetect-AI/ folder. Link to download model.h5
- In this section you can try with you own metagenomics data
In this line, you can replace the
hepadna.fasta
file with your own FASTA file. The command accepts 3 arguments:
hepadna.fasta
– the query containing the amino acid sequences.40
– the kmer_stride (recommended range: 20–60).0
– the execution mode:- Mode 0 (default): Allows input sequences ≥ 300 amino acids.
- Mode 1: Allows input sequences > 255 amino acids.
Remember to run this command while you are inside the /VirDetect-AI/API_VirDetect-AI/
directory.
python ./api_virdetect-ai.py ./hepadna.fasta 40 0
- The output are the following 6 pie graphs and 3 files csv, report with the predictions by kmers, prediction by sequences and sequences unknown.
If you use VirDetect-AI plese cite this paper: Alida Zárate, Lorena Díaz-González, Blanca Taboada, VirDetect-AI: a residual and convolutional neural network–based metagenomic tool for eukaryotic viral protein identification, Briefings in Bioinformatics, Volume 26, Issue 1, January 2025, bbaf001, https://doi.org/10.1093/bib/bbaf001
Ali Zárate - [email protected]
Project Link: https://github.com/alyzart22/VirDetect-AI
This research was partially supported by grants by PAPIIT-DGAPA-IN230523 awarded to Blanca Taboada.