CORAL

Comparative Orthologous Read-based Analysis of Lineage Substitutions

CORAL is a tool for scalable extraction, detection, and analysis of point mutations across species evolutionary history. It aligns multiple species to a shared reference genome, simulates reads, filters alignments by mapping quality, extracts unambiguous trinucleotide substitutions, and summarizes mutation rates and mutation spectra.

Reference

Preprint available at https://doi.org/10.64898/2026.02.02.703326

Pipeline overview

Installation

Requirements

Linux (or WSL2 for windows)
Conda (Miniforge or Anaconda recommended)

Recommended installation

git clone https://github.com/asafpinhasitechnion/CORAL.git
cd CORAL
conda env create -f environment.yml
conda activate coral-env
pip install -e .

Verify installation

coral --help
samtools --version
bwa
datasets --version

The provided environment.yml installs all required dependencies, including:

Python 3.10
BWA (classic)
SAMtools
NCBI Datasets CLI
unzip
All required Python dependencies

Optional: PHYLIP (for phylogenetic inference)

PHYLIP is not required for the core pipeline.

Install only if using phylogenetic inference via coral run_multi or coral run_phylip:

conda install -c bioconda phylip

Quick start

Three-taxon pipeline (outgroup + two ingroups)

coral run_single \
  --outgroup Saccharomyces_mikatae_IFO_1815 GCF_947241705.1 \
  --species Saccharomyces_paradoxus GCF_002079055.1 \
            Saccharomyces_cerevisiae_S288C GCF_000146045.2 \
  --output ../test_output \
  --mapq 60 \
  --suffix test

This runs the full pipeline, including genome download, reference indexing, read simulation, alignment, mutation extraction, and summary table and plot generation.

Multi-species analysis (experimental)

coral run_multi \
  --species-list '[["Drosophila_melanogaster","GCF_000001215.4"],["Drosophila_sechellia","GCF_004382195.2"],["Drosophila_mauritiana","GCF_004382145.1"],["Drosophila_simulans","GCF_016746395.2"]]' \
  --outgroup Drosophila_simulans \
  --output ../test_output \
  --run-id drosophila_test \
  --mapq 60

Note: Multi-species mode is experimental and intended for exploratory analyses.

Functional workflow

Step 1: Genome preparation

Download genomes by NCBI assembly accession
Index the reference genome for alignment

Step 2: Read simulation and alignment

Simulate FASTQ reads by sliding a window across genomes
Align simulated reads to the outgroup reference
Filter alignments by MAPQ and coverage
Allow customization of aligner and parameters

Step 3: Mutation detection

Generate pileups from reference and aligned BAMs
Extract unambiguous trinucleotide substitutions
Optionally retain genomic positions

Step 4: Normalization and analysis

Normalize mutation counts by underlying trinucleotide abundance
Collapse complementary strands into canonical spectra
Generate summary tables and visualizations

Output overview

Each run produces a self-contained output directory containing:

Mutations/*_mutations.csv.gz – per-branch mutation lists
Mutations/*_mutations.json – trinucleotide mutation counts
Tables/*.tsv – normalized mutation spectra
Plots/*.png – diagnostic and summary plots

Mutation files are named:

<taxon1>__<taxon2>__<reference>__mutations.*

This indicates mutations inferred on the branch leading to taxon1 since divergence from taxon2, using reference as the outgroup genome.

See OUTPUT_FORMAT.md for full file format and naming conventions.

Documentation

tutorial.ipynb – command-line tutorial and examples
OUTPUT_FORMAT.md – output file structure and naming conventions

Citation

Details, benchmarking, and results are available in the preprint: https://doi.org/10.64898/2026.02.02.703326

The final reference will be updated upon publication.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src/coral		src/coral
tests		tests
.gitignore		.gitignore
OUTPUT_FORMAT.md		OUTPUT_FORMAT.md
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
tutorial.ipynb		tutorial.ipynb
tutorial_step_by_step.ipynb		tutorial_step_by_step.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CORAL

Reference

Pipeline overview

Installation

Requirements

Recommended installation

Verify installation

Optional: PHYLIP (for phylogenetic inference)

Quick start

Three-taxon pipeline (outgroup + two ingroups)

Multi-species analysis (experimental)

Functional workflow

Step 1: Genome preparation

Step 2: Read simulation and alignment

Step 3: Mutation detection

Step 4: Normalization and analysis

Output overview

Documentation

Citation

About

Uh oh!

Releases

Packages

Languages

MaruvkaLab/CORAL

Folders and files

Latest commit

History

Repository files navigation

CORAL

Reference

Pipeline overview

Installation

Requirements

Recommended installation

Verify installation

Optional: PHYLIP (for phylogenetic inference)

Quick start

Three-taxon pipeline (outgroup + two ingroups)

Multi-species analysis (experimental)

Functional workflow

Step 1: Genome preparation

Step 2: Read simulation and alignment

Step 3: Mutation detection

Step 4: Normalization and analysis

Output overview

Documentation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages