Skip to content

Probe generation

Smutin Daniil edited this page Feb 19, 2025 · 1 revision

Generation of nucleotide probes

On this page we describe the tool usage for oligonucleotide generation

Warning: This tool is under active development.


Table of Contents

  1. Algorithm Overview
  2. Algorithm Steps
  3. Preparation Step
  4. Main Usage
  5. Contributing
  6. License
  7. Contact

Algorithm Overview

PROBEst employs a wrapped evolutionary algorithm to generate and optimize nucleotide probes. The workflow integrates Primer3, BLASTn, and a mutation module to iteratively refine probes based on user-defined criteria for universality and specificity. The AI-driven corrections further enhance the probe quality by evaluating and optimizing the probe set.


Algorithm Steps

  1. Select File for Probe Generation

    • Choose the primary file that will be used to generate the probe.
  2. Select Files for Universality Check

    • Identify and select files that will be used to assess the universality of the probe.
  3. Select Files for Specificity Check

    • Identify and select files that will be used to evaluate the specificity of the probe.
  4. Select Layouts

    • Determine the layouts that will be utilized during the probe generation process.
  5. Run Wrapped Evolutionary Algorithm

    • Execute the following steps within the evolutionary algorithm:

    a. Primer3 Generation

    • Generate primers using the Primer3 tool.

    b. BLASTn Check

    • Perform a BLASTn check to ensure the generated probes are suitable.

    c. Parsing

    • Parse the results from the BLASTn check to extract relevant information.

    d. Mutation in Probe

    • Introduce mutations in the probe based on the parsed data to optimize performance.

    e. AI Corrections

    • Probe evaluation based on AI optimizing function.

Preparation Step

pipeline.py relies on pre-prepared BLASTn databases. To create the required true_base, false_base, and contig_table, you can use the following script:

bash scripts/generator/prep_db.sh \
  -n {database_name} \
  -c {contig_name} \
  -t {tmp_dir} \
  [fasta_files]

Arguments:

  • -n {database_name}:
    Name of the output BLAST database (required).
  • -c {contig_name}:
    Output file to store contig names and their corresponding sequence headers (required).
  • -t {tmp_dir}:
    Temporary directory for intermediate files (optional, defaults to ./.tmp).
  • [fasta_files]:
    List of input FASTA files (gzipped or uncompressed).

This script prepares the necessary files for running the pipeline.py tool. Ensure that the input FASTA files are correctly formatted and accessible.


Main Usage

PROBEst can be run using the following command:

python pipeline.py [-h] \
  -i {INPUT} \
  -tb {TRUE_BASE} \
  -fb [FALSE_BASE [FALSE_BASE ...]] \
  -c {CONTIG_TABLE} \
  -o {OUTPUT}

Key Arguments:

  • -i INPUT: Input FASTA file for probe generation.
  • -tb TRUE_BASE: Input BLASTn database path for primer adjusting.
  • -fb FALSE_BASE: Input BLASTn database path for non-specific testing.
  • -c CONTIG_TABLE: .tsv table with BLAST database information.
  • -o OUTPUT: Output path for results.
  • -t THREADS: Number of threads to use.
  • -a ALGORITHM: Algorithm for probe generation (FISH or primer).

For a full list of arguments, run:

python pipeline.py --help

Clone this wiki locally