Command Reference

Complete reference for all bioinfo-tools commands and options.

Global Options

These options are available for the main bioinfo-tools command:

bioinfo-tools [--help] [--version] <command> [<args>]

Options

-h, --help - Show help message and exit
--version - Show program version number and exit

Available Commands

extract-cds - Filter CDS features from GenBank files
extract-proteins - Extract amino acid sequences
extract-genes - Extract nucleotide sequences
blast - Run BLAST searches

extract-cds

Filter CDS (Coding Sequences) from GenBank files, keeping only genes in the provided list.

Usage

bioinfo-tools extract-cds -i <input_folder> -g <genes_file> -o <output_folder>

Required Arguments

--input-folder, -i DIR - Input folder containing GenBank files (.gbk, .gb, .genbank)
--genes-list, -g FILE - File containing gene names (one per line)
--output-folder, -o DIR - Output folder for filtered GenBank files

Examples

# Basic usage
bioinfo-tools extract-cds -i genbank_files/ -g genes.txt -o filtered_output/

# With long options
bioinfo-tools extract-cds --input-folder genbank_files/ \
                          --genes-list genes.txt \
                          --output-folder filtered_output/

Input Format

genes.txt:

dnaA
rpoB
recA
gyrA

Output

Filtered GenBank files containing only the specified genes and source features.

extract-proteins

Extract CDS translations (amino acid sequences) from GenBank files, organized by gene name.

Usage

bioinfo-tools extract-proteins -i <input_folder> -g <genes_file> -o <output_folder>

Required Arguments

--input-folder, -i DIR - Input folder containing GenBank files
--genes-list, -g FILE - File containing gene names (one per line)
--output-folder, -o DIR - Output folder for amino acid FASTA files

Examples

# Basic usage
bioinfo-tools extract-proteins -i genbank_files/ -g genes.txt -o proteins_output/

# With long options
bioinfo-tools extract-proteins --input-folder genbank_files/ \
                               --genes-list genes.txt \
                               --output-folder proteins_output/

Output Structure

proteins_output/
├── dnaA/
│   ├── genome1.fasta
│   └── genome2.fasta
├── rpoB/
│   ├── genome1.fasta
│   └── genome2.fasta
└── recA/
    └── genome2.fasta

Each FASTA file contains the amino acid sequence for that gene from that genome.

extract-genes

Extract CDS nucleotide sequences from GenBank files, organized by gene name.

Usage

bioinfo-tools extract-genes -i <input_folder> -g <genes_file> -o <output_folder>

Required Arguments

--input-folder, -i DIR - Input folder containing GenBank files
--genes-list, -g FILE - File containing gene names (one per line)
--output-folder, -o DIR - Output folder for nucleotide FASTA files

Examples

# Basic usage
bioinfo-tools extract-genes -i genbank_files/ -g genes.txt -o genes_output/

# With long options
bioinfo-tools extract-genes --input-folder genbank_files/ \
                            --genes-list genes.txt \
                            --output-folder genes_output/

Output Structure

genes_output/
├── dnaA/
│   ├── genome1.fasta
│   └── genome2.fasta
├── rpoB/
│   ├── genome1.fasta
│   └── genome2.fasta
└── recA/
    └── genome2.fasta

Each FASTA file contains the nucleotide sequence for that gene from that genome.

blast

Perform BLAST searches using multiple query and database files. Databases are automatically formatted.

Usage

bioinfo-tools blast -q <query_folder> -d <db_folder> -t <db_type> -b <blast_type> -e <evalue>

Required Arguments

--query-folder, -q DIR - Folder containing query FASTA files
--db-folder, -d DIR - Folder containing database FASTA files
--db-type, -t {nucl,prot} - Database type:
- nucl - Nucleotide database
- prot - Protein database
--blast-type, -b TYPE - BLAST program to use:
- blastn - Nucleotide vs nucleotide
- blastp - Protein vs protein
- blastx - Translated nucleotide vs protein
- tblastn - Protein vs translated nucleotide
- tblastx - Translated nucleotide vs translated nucleotide
--evalue, -e FLOAT - E-value threshold (e.g., 1e-5, 0.001)

Optional Arguments

--output-folder, -o DIR - Output folder for BLAST results (default: blast_outputs)
--outfmt INT - BLAST output format (0-11, default: 6 - tabular)

Examples

# Basic nucleotide BLAST
bioinfo-tools blast -q queries/ -d databases/ -t nucl -b blastn -e 1e-5

# Protein BLAST with custom output
bioinfo-tools blast -q protein_queries/ -d protein_dbs/ -t prot -b blastp -e 0.001 -o my_results/

# With long options
bioinfo-tools blast --query-folder queries/ \
                    --db-folder databases/ \
                    --db-type nucl \
                    --blast-type blastn \
                    --evalue 1e-5 \
                    --output-folder results/ \
                    --outfmt 6

Output Format

By default, BLAST outputs tabular format (outfmt 6):

query_id  subject_id  %identity  alignment_length  mismatches  gap_opens  ...

Output Files

Each query-database combination produces a separate result file:

blast_outputs/
├── query1.fasta_database1.fasta_result.txt
├── query1.fasta_database2.fasta_result.txt
├── query2.fasta_database1.fasta_result.txt
└── query2.fasta_database2.fasta_result.txt

Supported Output Formats

0 - Pairwise
1 - Query-anchored showing identities
2 - Query-anchored no identities
3 - Flat query-anchored, show identities
4 - Flat query-anchored, no identities
5 - XML
6 - Tabular (default)
7 - Tabular with comment lines
8 - Text ASN.1
9 - Binary ASN.1
10 - Comma-separated values
11 - BLAST archive format

Common Patterns

Processing Multiple Datasets

# Extract and process genes from multiple genomes
for dataset in dataset1 dataset2 dataset3; do
    bioinfo-tools extract-genes -i ${dataset}/genbank/ \
                                -g genes.txt \
                                -o ${dataset}/genes_output/
done

Pipeline Example

# Complete workflow
# 1. Filter GenBank files
bioinfo-tools extract-cds -i raw_genbank/ -g important_genes.txt -o filtered_gbk/

# 2. Extract protein sequences
bioinfo-tools extract-proteins -i filtered_gbk/ -g important_genes.txt -o proteins/

# 3. Extract nucleotide sequences
bioinfo-tools extract-genes -i filtered_gbk/ -g important_genes.txt -o genes/

# 4. Run BLAST analysis
bioinfo-tools blast -q proteins/dnaA/ -d reference_dbs/ -t prot -b blastp -e 1e-10

Getting Help

# General help
bioinfo-tools --help

# Command-specific help
bioinfo-tools extract-cds --help
bioinfo-tools extract-proteins --help
bioinfo-tools extract-genes --help
bioinfo-tools blast --help

Exit Codes

All commands return standard exit codes:

0 - Success
1 - Error (check log messages)
130 - Interrupted by user (Ctrl+C)

Logging

All commands provide informative logging output:

2024-01-01 12:00:00 - INFO - Gene Extractor started
2024-01-01 12:00:00 - INFO - Input folder: genbank_files/
2024-01-01 12:00:00 - INFO - Loaded 5 genes from genes.txt
2024-01-01 12:00:00 - INFO - Found 10 file(s) in genbank_files/
...
2024-01-01 12:00:05 - INFO - Gene Extractor finished successfully

Environment Variables

Currently, bioinfo-tools does not use environment variables for configuration. All settings are passed via command-line arguments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command Reference

Global Options

Options

Available Commands

extract-cds

Usage

Required Arguments

Examples

Input Format

Output

extract-proteins

Usage

Required Arguments

Examples

Output Structure

extract-genes

Usage

Required Arguments

Examples

Output Structure

blast

Usage

Required Arguments

Optional Arguments

Examples

Output Format

Output Files

Supported Output Formats

Common Patterns

Processing Multiple Datasets

Pipeline Example

Getting Help

Exit Codes

Logging

Environment Variables

See Also

FilesExpand file tree

COMMAND_REFERENCE.md

Latest commit

History

COMMAND_REFERENCE.md

File metadata and controls

Command Reference

Global Options

Options

Available Commands

extract-cds

Usage

Required Arguments

Examples

Input Format

Output

extract-proteins

Usage

Required Arguments

Examples

Output Structure

extract-genes

Usage

Required Arguments

Examples

Output Structure

blast

Usage

Required Arguments

Optional Arguments

Examples

Output Format

Output Files

Supported Output Formats

Common Patterns

Processing Multiple Datasets

Pipeline Example

Getting Help

Exit Codes

Logging

Environment Variables

See Also