Skip to content

Megmugure/GhostBuster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

# Ghost Population Detection Pipeline

This repository contains a Snakemake-based workflow to detect unsampled "ghost" populations in genomic datasets using demographic inference, statistical testing, and model selection.
The pipeline integrates STRUCTURE, IMa3, ARGweaver, and custom analysis scripts to evaluate signals of ghost introgression in population genomics data.


## Key Features

- **STRUCTURE-based inference** of admixture models across multiple K values
- **Likelihood Ratio Tests (LRTs)** using IMa3 for model comparison
- **Bootstrap testing** and AIC/BIC model selection
- **ARGweaver-based coalescent simulations** and TMRCA distribution analysis
- **Multimodality tests** (e.g., Hartigan’s Dip Test) to detect non-standard coalescent patterns
- Fully automated with **Snakemake**
- Reproducible environments with **Conda**


## Repository Structure

```text
ghost-pop-gen/
├── Snakefile                  # Main Snakemake pipeline
├── Snakefile.part3            # ARGweaver + modality analysis
├── environment.yml            # Main conda environment
├── config/                    # YAML config files and model specifications
│   ├── config.yaml
│   ├── model1.par
│   └── nested_models_2pop.txt
├── data/                      # Input files (FASTA, .u, .str)
│   ├── fasta/
│   ├── ima3_inputs_2pop/
│   ├── ima3_inputs_3pop/
│   └── structure_inputs/
├── envs/                      # Conda envs for specific tools
│   └── argweaver_py2.yaml
├── results/                   # Output files and visualizations
│   ├── structure_outputs/
│   ├── ima3/
│   ├── *.csv
│   ├── *.png
├── scripts/                   # R, Python, and Bash helper scripts
├── software/                  # Compiled tools (e.g., ARGweaver)
└── README.md                 

Installation and Setup

  1. Clone the repository:
git clone https://github.com/Megmugure/ghost-pop-gen.git
cd ghost-pop-gen
  1. Create and activate the conda environment:
conda env create -f environment.yml
conda activate ghost-pop-gen

Running the Pipeline

To run the full workflow:

snakemake --cores 4

To perform a dry run:

snakemake -n

To run the ARGweaver + modality testing component separately:

snakemake -s Snakefile.part3 --cores 4

To generate a DAG (workflow graph):

snakemake --dag | dot -Tpng > dag.png

Example Analysis Commands

# Run Kolmogorov-Smirnov test on TMRCA values
Rscript scripts/KS_tests.R data/input.tmrca

# Run LRT test
python scripts/LRT_test.py results/lrt_values.txt

# Run STRUCTURE bootstrap LRT
bash scripts/bootstrap_test.sh data/structure_data.str

Citing This Work

If you use this pipeline in your research, please cite:

(preprint link or DOI coming soon)

License

This project is licensed under the MIT License. See the LICENSE file for full details.

Author

Margaret Wanjiku [email protected] GitHub: Megmugure

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published