Skip to content

Latest commit

 

History

History
389 lines (298 loc) · 9.13 KB

File metadata and controls

389 lines (298 loc) · 9.13 KB

Migration Guide

This guide helps users transition from the old standalone scripts to the new unified CLI interface.

Important Notice

⚠️ The old standalone Python scripts have been removed from the repository. All functionality is now available through the unified bioinfo-tools CLI. This guide shows you how to migrate your workflows.

What Changed?

The repository has been reorganized into a professional CLI tool called bioinfo-tools. All the functionality from the individual scripts is now available through subcommands.

Quick Reference

Old Script → New Command Mapping

Old Script New Command
cdsselector.py bioinfo-tools extract-cds
take genes into aminoacid.py bioinfo-tools extract-proteins
take genes into nucleotides.py bioinfo-tools extract-genes
blastall.py bioinfo-tools blast
ab1_to_blast.py bioinfo-tools convert-ab1
blastall_result.py bioinfo-tools process-blast-results
fastarename.py bioinfo-tools rename-fasta
get_mutations.py bioinfo-tools compare-proteins
pdb_downloader.py bioinfo-tools download-pdb
pepnucfunction.py bioinfo-tools generate-pgap-files
retrieve bacterial ribossomal rna.py bioinfo-tools extract-rrna

Installation

Old Method (No Longer Available)

# Each script used individually - SCRIPTS REMOVED
# python cdsselector.py --input-folder ...

Note: The standalone scripts are no longer in the repository. You must use the new CLI.

New Method (Recommended)

# Install the package
pip install -e .

# Use the unified CLI
bioinfo-tools extract-cds --input-folder ...

Command Conversions

Extract CDS (formerly cdsselector.py)

Old:

python cdsselector.py --input-folder genbank_files/ \
                      --genes-list genes.txt \
                      --output-folder output/

New:

bioinfo-tools extract-cds --input-folder genbank_files/ \
                          --genes-list genes.txt \
                          --output-folder output/

Or with short options:

bioinfo-tools extract-cds -i genbank_files/ -g genes.txt -o output/

Extract Proteins (formerly take genes into aminoacid.py)

Old:

python "take genes into aminoacid.py" --input-folder genbank_files/ \
                                      --genes-list genes.txt \
                                      --output-folder proteins/

New:

bioinfo-tools extract-proteins --input-folder genbank_files/ \
                               --genes-list genes.txt \
                               --output-folder proteins/

Or with short options:

bioinfo-tools extract-proteins -i genbank_files/ -g genes.txt -o proteins/

Extract Genes (formerly take genes into nucleotides.py)

Old:

python "take genes into nucleotides.py" --input-folder genbank_files/ \
                                        --genes-list genes.txt \
                                        --output-folder genes/

New:

bioinfo-tools extract-genes --input-folder genbank_files/ \
                            --genes-list genes.txt \
                            --output-folder genes/

Or with short options:

bioinfo-tools extract-genes -i genbank_files/ -g genes.txt -o genes/

BLAST (formerly blastall.py)

Old:

python blastall.py queries/ databases/ nucl blastn 1e-5

New:

bioinfo-tools blast --query-folder queries/ \
                    --db-folder databases/ \
                    --db-type nucl \
                    --blast-type blastn \
                    --evalue 1e-5

Or with short options:

bioinfo-tools blast -q queries/ -d databases/ -t nucl -b blastn -e 1e-5

Additional improvements:

  • Named arguments instead of positional (more clear)
  • Optional output folder specification: -o or --output-folder
  • Optional output format specification: --outfmt 6

Convert AB1 (formerly ab1_to_blast.py)

Old:

python ab1_to_blast.py sequence.ab1
# Output hardcoded to sequence.fastq

New:

bioinfo-tools convert-ab1 -i sequence.ab1 -o output.fastq

Improvements:

  • Explicit output filename control
  • No hardcoded filenames

Process BLAST Results (formerly blastall_result.py)

Old:

python blastall_result.py blast_results/ matrix_output.txt

New:

bioinfo-tools process-blast-results -i blast_results/ -o matrix_output.txt

Or with long options:

bioinfo-tools process-blast-results --input-folder blast_results/ --output matrix.txt

Rename FASTA (formerly fastarename.py)

Old:

python fastarename.py sequence.fasta

New:

bioinfo-tools rename-fasta -i sequence.fasta

Improvements:

  • Optional dry-run mode: --dry-run

Compare Proteins (formerly get_mutations.py)

Old:

# Edit script to set sequences
original = "QEHDG"
engenheirada = "ARNDG"
python get_mutations.py

New:

bioinfo-tools compare-proteins -o QEHDG -m ARNDG

Improvements:

  • Command-line arguments instead of editing script
  • Flexible sequences from command line

Download PDB (formerly pdb_downloader.py)

Old:

# Edit script to set PDB IDs
for i in ["ID"]:
python pdb_downloader.py

New:

bioinfo-tools download-pdb -i 1A00 -o structures/
# Or multiple: -i 1A00,2B00,3C00

Improvements:

  • Command-line arguments
  • Multiple PDB IDs supported
  • Output folder specification

Generate PGAP Files (formerly pepnucfunction.py)

Old:

python pepnucfunction.py input_folder/ output_folder/

New:

bioinfo-tools generate-pgap-files -i input_folder/ -o output_folder/

Or with long options:

bioinfo-tools generate-pgap-files --input-folder genbank/ --output-folder pgap/

Extract rRNA (formerly retrieve bacterial ribossomal rna.py)

Old:

python "retrieve bacterial ribossomal rna.py" input_folder/ output_folder/

New:

bioinfo-tools extract-rrna -i input_folder/ -o output_folder/

Or with long options:

bioinfo-tools extract-rrna --input-folder genbank/ --output-folder rrna/

What Are the Benefits?

1. Unified Interface

All tools accessible through a single bioinfo-tools command instead of remembering different script names.

2. Consistent Arguments

All commands use the same argument patterns:

  • -i / --input-folder for input
  • -o / --output-folder for output
  • -g / --genes-list for gene lists

3. Better Help System

# Get general help
bioinfo-tools --help

# Get help for specific command
bioinfo-tools extract-cds --help

4. Package Installation

Install once and use anywhere:

pip install bioinfo-tools
# Now available system-wide
bioinfo-tools extract-cds ...

5. Easy to Extend

Adding new commands is standardized and documented in CONTRIBUTING.md.

Backward Compatibility

⚠️ The old scripts have been removed from the repository. There is no backward compatibility with the standalone scripts. All users must migrate to the new bioinfo-tools CLI.

If you have old scripts in your workflows, you need to update them using the command conversions shown in this guide.

Using in Scripts/Pipelines

Shell Scripts

Old:

#!/bin/bash
python cdsselector.py --input-folder $INPUT --genes-list $GENES --output-folder $OUTPUT
python "take genes into aminoacid.py" --input-folder $INPUT --genes-list $GENES --output-folder $AA_OUT

New:

#!/bin/bash
bioinfo-tools extract-cds -i $INPUT -g $GENES -o $OUTPUT
bioinfo-tools extract-proteins -i $INPUT -g $GENES -o $AA_OUT

Python Scripts

Old:

import subprocess
subprocess.run(["python", "cdsselector.py", "--input-folder", input_dir, ...])

New:

import subprocess
subprocess.run(["bioinfo-tools", "extract-cds", "-i", input_dir, ...])

# Or import and use directly
from bioinfo_tools.commands import extract_cds
args = argparse.Namespace(input_folder=input_dir, ...)
extract_cds.run(args)

Nextflow/Snakemake

Old Nextflow:

process extractCDS {
    script:
    """
    python ${projectDir}/cdsselector.py --input-folder ${input} --genes-list ${genes} --output-folder ${output}
    """
}

New Nextflow:

process extractCDS {
    conda 'bioconda::bioinfo-tools'  // When available on bioconda
    
    script:
    """
    bioinfo-tools extract-cds -i ${input} -g ${genes} -o ${output}
    """
}

Troubleshooting

"bioinfo-tools: command not found"

Solution: Install the package:

pip install -e .

Or use Python module syntax:

python -m bioinfo_tools extract-cds --help

Scripts still using old names

Solution: Update your scripts using the conversion table above.

Need both old and new?

The old scripts still exist in the repository. However, we recommend migrating to the new CLI for better maintainability.

Questions?

If you have issues with migration:

  • Check the README.md for detailed usage
  • See CONTRIBUTING.md for development details
  • Open an issue on GitHub
  • Email: davijosuemarcon@gmail.com