GitHub

# Ghost Population Detection Pipeline

This repository contains a Snakemake-based workflow to detect unsampled "ghost" populations in genomic datasets using demographic inference, statistical testing, and model selection.
The pipeline integrates STRUCTURE, IMa3, ARGweaver, and custom analysis scripts to evaluate signals of ghost introgression in population genomics data.


## Key Features

- **STRUCTURE-based inference** of admixture models across multiple K values
- **Likelihood Ratio Tests (LRTs)** using IMa3 for model comparison
- **Bootstrap testing** and AIC/BIC model selection
- **ARGweaver-based coalescent simulations** and TMRCA distribution analysis
- **Multimodality tests** (e.g., Hartigan’s Dip Test) to detect non-standard coalescent patterns
- Fully automated with **Snakemake**
- Reproducible environments with **Conda**


## Repository Structure

```text
ghost-pop-gen/
├── Snakefile                  # Main Snakemake pipeline
├── Snakefile.part3            # ARGweaver + modality analysis
├── environment.yml            # Main conda environment
├── config/                    # YAML config files and model specifications
│   ├── config.yaml
│   ├── model1.par
│   └── nested_models_2pop.txt
├── data/                      # Input files (FASTA, .u, .str)
│   ├── fasta/
│   ├── ima3_inputs_2pop/
│   ├── ima3_inputs_3pop/
│   └── structure_inputs/
├── envs/                      # Conda envs for specific tools
│   └── argweaver_py2.yaml
├── results/                   # Output files and visualizations
│   ├── structure_outputs/
│   ├── ima3/
│   ├── *.csv
│   ├── *.png
├── scripts/                   # R, Python, and Bash helper scripts
├── software/                  # Compiled tools (e.g., ARGweaver)
└── README.md

Installation and Setup

Clone the repository:

git clone https://github.com/Megmugure/ghost-pop-gen.git
cd ghost-pop-gen

Create and activate the conda environment:

conda env create -f environment.yml
conda activate ghost-pop-gen

Running the Pipeline

To run the full workflow:

snakemake --cores 4

To perform a dry run:

snakemake -n

To run the ARGweaver + modality testing component separately:

snakemake -s Snakefile.part3 --cores 4

To generate a DAG (workflow graph):

snakemake --dag | dot -Tpng > dag.png

Example Analysis Commands

# Run Kolmogorov-Smirnov test on TMRCA values
Rscript scripts/KS_tests.R data/input.tmrca

# Run LRT test
python scripts/LRT_test.py results/lrt_values.txt

# Run STRUCTURE bootstrap LRT
bash scripts/bootstrap_test.sh data/structure_data.str

Citing This Work

If you use this pipeline in your research, please cite:

(preprint link or DOI coming soon)

License

This project is licensed under the MIT License. See the LICENSE file for full details.

Author

Margaret Wanjiku [email protected] GitHub: Megmugure

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Models		Models
tests_of_ghost_introgression_pipeline		tests_of_ghost_introgression_pipeline
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation and Setup

Running the Pipeline

Example Analysis Commands

Citing This Work

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Megmugure/GhostBuster

Folders and files

Latest commit

History

Repository files navigation

Installation and Setup

Running the Pipeline

Example Analysis Commands

Citing This Work

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages