This workflow accepts a set of genomic variants in multiple PLINK (BED+BIM+FAM, PED+MAP or TPED+TFAM) or VCF/BCF and generates a phylogenetic tree based on the identity-by-state (IBS) distance between samples. The workflow includes bootstrapping to assess the stability of each node throughout repeated runs, and generate high-resolution plots using GraphLan.
To run the pipeline, you need to have installed:
- Graphlan
- Python 2.7 with the packages colormap, xml and biopython
- plink v1.90
- phylip
The pipeline can be run as a regular nextflow workflow:
nextflow run RenzoTale88/nf-PhyloTree --infile $PWD/mybedfile --ftype bed --spp cow --outfolder $PWD/ibstree --bootstrap 100
The workflow will run using Docker by default, but it can be run also using Singularity (-profile singularity) or Conda/Mamba (-profile mamba or profile conda).
To cite the software, please use the following reference:
Talenti et al., ‘Continent-Wide Genomic Analysis of the African Buffalo (Syncerus Caffer)’.
- Felsenstein, ‘PHYLIP - Phylogeny Inference Package (Version 3.2)’.
- Asnicar et al., ‘Compact Graphical Representation of Phylogenetic Data and Metadata with GraPhlAn’.
- Chang et al., ‘Second-Generation PLINK: Rising to the Challenge of Larger and Richer Datasets’.