Complete reference for all bioinfo-tools commands and options.
These options are available for the main bioinfo-tools command:
bioinfo-tools [--help] [--version] <command> [<args>]-h, --help- Show help message and exit--version- Show program version number and exit
- extract-cds - Filter CDS features from GenBank files
- extract-proteins - Extract amino acid sequences
- extract-genes - Extract nucleotide sequences
- blast - Run BLAST searches
Filter CDS (Coding Sequences) from GenBank files, keeping only genes in the provided list.
bioinfo-tools extract-cds -i <input_folder> -g <genes_file> -o <output_folder>--input-folder, -i DIR- Input folder containing GenBank files (.gbk, .gb, .genbank)--genes-list, -g FILE- File containing gene names (one per line)--output-folder, -o DIR- Output folder for filtered GenBank files
# Basic usage
bioinfo-tools extract-cds -i genbank_files/ -g genes.txt -o filtered_output/
# With long options
bioinfo-tools extract-cds --input-folder genbank_files/ \
--genes-list genes.txt \
--output-folder filtered_output/genes.txt:
dnaA
rpoB
recA
gyrA
Filtered GenBank files containing only the specified genes and source features.
Extract CDS translations (amino acid sequences) from GenBank files, organized by gene name.
bioinfo-tools extract-proteins -i <input_folder> -g <genes_file> -o <output_folder>--input-folder, -i DIR- Input folder containing GenBank files--genes-list, -g FILE- File containing gene names (one per line)--output-folder, -o DIR- Output folder for amino acid FASTA files
# Basic usage
bioinfo-tools extract-proteins -i genbank_files/ -g genes.txt -o proteins_output/
# With long options
bioinfo-tools extract-proteins --input-folder genbank_files/ \
--genes-list genes.txt \
--output-folder proteins_output/proteins_output/
├── dnaA/
│ ├── genome1.fasta
│ └── genome2.fasta
├── rpoB/
│ ├── genome1.fasta
│ └── genome2.fasta
└── recA/
└── genome2.fasta
Each FASTA file contains the amino acid sequence for that gene from that genome.
Extract CDS nucleotide sequences from GenBank files, organized by gene name.
bioinfo-tools extract-genes -i <input_folder> -g <genes_file> -o <output_folder>--input-folder, -i DIR- Input folder containing GenBank files--genes-list, -g FILE- File containing gene names (one per line)--output-folder, -o DIR- Output folder for nucleotide FASTA files
# Basic usage
bioinfo-tools extract-genes -i genbank_files/ -g genes.txt -o genes_output/
# With long options
bioinfo-tools extract-genes --input-folder genbank_files/ \
--genes-list genes.txt \
--output-folder genes_output/genes_output/
├── dnaA/
│ ├── genome1.fasta
│ └── genome2.fasta
├── rpoB/
│ ├── genome1.fasta
│ └── genome2.fasta
└── recA/
└── genome2.fasta
Each FASTA file contains the nucleotide sequence for that gene from that genome.
Perform BLAST searches using multiple query and database files. Databases are automatically formatted.
bioinfo-tools blast -q <query_folder> -d <db_folder> -t <db_type> -b <blast_type> -e <evalue>--query-folder, -q DIR- Folder containing query FASTA files--db-folder, -d DIR- Folder containing database FASTA files--db-type, -t {nucl,prot}- Database type:nucl- Nucleotide databaseprot- Protein database
--blast-type, -b TYPE- BLAST program to use:blastn- Nucleotide vs nucleotideblastp- Protein vs proteinblastx- Translated nucleotide vs proteintblastn- Protein vs translated nucleotidetblastx- Translated nucleotide vs translated nucleotide
--evalue, -e FLOAT- E-value threshold (e.g., 1e-5, 0.001)
--output-folder, -o DIR- Output folder for BLAST results (default:blast_outputs)--outfmt INT- BLAST output format (0-11, default: 6 - tabular)
# Basic nucleotide BLAST
bioinfo-tools blast -q queries/ -d databases/ -t nucl -b blastn -e 1e-5
# Protein BLAST with custom output
bioinfo-tools blast -q protein_queries/ -d protein_dbs/ -t prot -b blastp -e 0.001 -o my_results/
# With long options
bioinfo-tools blast --query-folder queries/ \
--db-folder databases/ \
--db-type nucl \
--blast-type blastn \
--evalue 1e-5 \
--output-folder results/ \
--outfmt 6By default, BLAST outputs tabular format (outfmt 6):
query_id subject_id %identity alignment_length mismatches gap_opens ...
Each query-database combination produces a separate result file:
blast_outputs/
├── query1.fasta_database1.fasta_result.txt
├── query1.fasta_database2.fasta_result.txt
├── query2.fasta_database1.fasta_result.txt
└── query2.fasta_database2.fasta_result.txt
0- Pairwise1- Query-anchored showing identities2- Query-anchored no identities3- Flat query-anchored, show identities4- Flat query-anchored, no identities5- XML6- Tabular (default)7- Tabular with comment lines8- Text ASN.19- Binary ASN.110- Comma-separated values11- BLAST archive format
# Extract and process genes from multiple genomes
for dataset in dataset1 dataset2 dataset3; do
bioinfo-tools extract-genes -i ${dataset}/genbank/ \
-g genes.txt \
-o ${dataset}/genes_output/
done# Complete workflow
# 1. Filter GenBank files
bioinfo-tools extract-cds -i raw_genbank/ -g important_genes.txt -o filtered_gbk/
# 2. Extract protein sequences
bioinfo-tools extract-proteins -i filtered_gbk/ -g important_genes.txt -o proteins/
# 3. Extract nucleotide sequences
bioinfo-tools extract-genes -i filtered_gbk/ -g important_genes.txt -o genes/
# 4. Run BLAST analysis
bioinfo-tools blast -q proteins/dnaA/ -d reference_dbs/ -t prot -b blastp -e 1e-10# General help
bioinfo-tools --help
# Command-specific help
bioinfo-tools extract-cds --help
bioinfo-tools extract-proteins --help
bioinfo-tools extract-genes --help
bioinfo-tools blast --helpAll commands return standard exit codes:
0- Success1- Error (check log messages)130- Interrupted by user (Ctrl+C)
All commands provide informative logging output:
2024-01-01 12:00:00 - INFO - Gene Extractor started
2024-01-01 12:00:00 - INFO - Input folder: genbank_files/
2024-01-01 12:00:00 - INFO - Loaded 5 genes from genes.txt
2024-01-01 12:00:00 - INFO - Found 10 file(s) in genbank_files/
...
2024-01-01 12:00:05 - INFO - Gene Extractor finished successfully
Currently, bioinfo-tools does not use environment variables for configuration. All settings are passed via command-line arguments.
- README.md - General overview and installation
- MIGRATION.md - Migrating from old scripts
- CONTRIBUTING.md - Adding new commands
- DEPLOYMENT.md - Deployment instructions