VIGOR4 (Viral Genome ORF Reader) is a Java application to predict protein sequences encoded in viral genomes.
VIGOR4 determines the protein coding sequences by sequence similarity searching against curated viral protein databases.
This project is funded by the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health under Award Number U19AI110819 and Contract Number HHSN272201400028C and is a collaboration between Northrop Grumman Health IT, J. Craig Venter Institute, and Vecna Technologies.
Vigor4 uses the VIGOR_DB project which currently has databases for the following viruses:
- Influenza (A & B for human, avian, and swine, and C for human)
- SARS-CoV-2
- West Nile Virus
- Zika Virus
- Chikungunya Virus
- Eastern Equine Encephalitis Virus
- Respiratory Syncytial Virus
- Rotavirus
- Enterovirus
- Lassa Mammarenavirus
mvn -DskipTests clean package
unzip target/vigor-4.0.VERSION.zip -d INSTALL_DIRECTORY
# set reference database path, exonerate path and temporary directory
vi INSTALL_DIRECTORY/vigor-4.0.VERSION/config/vigor.ini
INSTALL_DIRECTORY/vigor-4.0.VERSION/bin/vigor4 -i VIRUS.fasta -o OUTPUT_DIRECTORY -d VIRUS_DB
Vigor4 uses 'Maven' to build & package the program
VIGOR4 runs on Linux environment
JAVA8 or above
Exonerate2.2.0
Vigor Viral database (Available at VIGOR_DB)
Refer to INSTALL.md for detailed instructions
usage: vigor4 -i inputfasta -o outputprefix -d refdb
Named Arguments:
-h, --help show this help message and exit
-i <input fasta>, --input-fasta <input fasta>
path to fasta file of genomic sequences to be
annotated
-o <output prefix>, --output-prefix <output prefix>
prefix for outputfile files, e.g. if the output
prefix is /mydir/anno VIGOR will create output
files /mydir/anno.tbl, /mydir/anno.stats, etc.
-c MIN_COVERAGE, --min-coverage MIN_COVERAGE
minimum coverage of reference product (0-100)
required to report a gene, by default coverage is
ignored
-P <parameter=value~~...~~parameter=value>, --parameter <parameter=value~~...~~parameter=value>
~~ separated list of VIGOR parameters to override
default values. Use --list-config-parameters to
see settable parameters.
-v, --verbose verbose logging (default=terse)
--list-config-parameters
list available configuration parameters and exit
--config-file CONFIG_FILE
config file to use
--reference-database-path REFERENCE_DATABASE_PATH
reference database path
--virus-config VIRUSSPECIFICCONFIG
Path to virus specific configuration
--virus-config-path VIRUSSPECIFICCONFIGPATH
Path to directory containing virus specific
config files.
--overwrite-output overwrite existing output files if they exist
--temporary-directory TEMPORARYDIRECTORY
Root directory to use for temporary directories
reference database:
-d <ref db>, --reference-database <ref db>
specify the reference database to be used, (-D is
a synonym for this option)
locus tag usage:
-l, --no-locus-tags do NOT use locus_tags in TBL file output
(incompatible with -L)
-L [<locus_tag_prefix>], --locus-tags [<locus_tag_prefix>]
USE locus_tags in TBL file output (incompatible
with -l). If no prefix is provided, the prefix
"vigor_" will be used.
outputprefix.rpt - summary of program results
outputprefix.cds - fasta file of predicted CDSs
outputprefix.pep - fasta file of predicted proteins
outputprefix.tbl - predicted features in GenBank tbl format
outputprefix.aln - alignment of predicted protein to reference, and
reference protein to genome
outputprefix.gff3 - predicted features in GFF3 format
-0, --circular complete circular genome (allows gene to span
origin). This feature is currently unimplemented
-f {0,1,2}, --frameshift-sensitivity {0,1,2}
frameshift sensitivity, 0=ignore frameshifts,
1=normal (default), 2=sensitive.
-m, --ignore-reference-requirements
ignore reference match requirements
(coverage/identity/similarity), sometimes useful
when running VIGOR to evaluate raw contigs and
rough draft sequences
-x <ref_id,...,ref_id>, --ignore-refID <ref_id,...,ref_id>
comma separated list of reference sequence IDs to
ignore (useful when debugging a reference
database). Not currently implemented
Name | Description |
---|---|
flua | Influenza A |
flub | Influenza B |
fluc | Influenza C |
lassa | Lassa Mammarenavirus |
rsv | Respiratory syntactical virus (RSV) |
rtva | Rotavirus A |
rtvb | Rotavirus B |
rtvc | Rotavirus C |
rtvf | Rotavirus F |
rtvg | Rotavirus G |
sapo | Sapovirus |
veev | Alphaviruses (VEEV/EEEV) |
wnvI | West Nile Virus - Lineage I |
wnvII | West Nile Virus - Lineage II |
zikv | Zika virus |