Flanders is a modular pipeline and toolkit for scalable fine-mapping and colocalization of genetic association signals across large scale datasets and multiple traits. Implemented using Nextflow and mostly R, it separates computationally intensive fine-mapping from downstream colocalization to optimize reusability and performance.
Before running the pipeline, ensure you have the following installed:
- Nextflow (v24.04+)
- For environment management, one of:
nextflow run Biostatistics-Unit-HT/Flanders -r 1.0 -profile [docker|singularity|conda] --summarystats_input /path/to/input_table.tsv --run_colocalization true --finemap_id my_finemap_run --coloc_id my_coloc_run -w ./work -resumenextflow run Biostatistics-Unit-HT/Flanders -r 1.0 -profile [docker|singularity|conda] --coloc_h5ad_input /path/to/finemapping_output.h5ad --run_colocalization true --coloc_id my_coloc_run -w ./work -resumenextflow run Biostatistics-Unit-HT/Flanders -r 1.0 -profile test,singularity -w ./workFlanders separates the fine-mapping and colocalization process into two distinct steps:
| Input | Description |
|---|---|
| GWAS summary statistics | .tsv/.csv (optionally gzipped) |
| LD reference panel | PLINK-format reference panel (.bed/.bim/.fam) — preferably from the same sample population used in the GWAS |
| Metadata and GWAS-specific parameters table | .tsv file listing GWAS summary statistics paths and trait-specific parameters |
- Munging of GWAS summary statistics
- Format harmonization and imputation of missing information (e.g. missing allele frequency calculated from the LD reference panel)
- Optional liftover to GRCh38
- Optional restriction of analysis to enlisted chromosomes
⚠️ In addition to autosomes, chromosomes X and Y are also accepted. - Alphabetical ordering of alleles, ensuring the first one in alphabetical order is the effect allele (effect sizes and allele frequencies are flipped/inverted where needed)
- Conversion of SNP IDs to Flanders internal coding of
"chr"CHR:POS:EA:NEAwhere EA is the first allele in alphabetical order⚠️ This differs from common REF/ALT conventions and allows for robust variants matching between multiple GWAS summary statistics and LD reference panel.
- Identifies genomic regions containing significant associated SNPs by employing
Locusbreaker, an in-house developed algorithm which defines each association peak based on the distance between the end of a peak and the start of the next one.
Locusbreaker first selects all SNPs below a given a p-value threshold (suggested value 1x10-6, customizable at the column p_thresh2 of the Metadata and GWAS-specific parameters table), identifying groups of SNPs positionally close to each other.
If two consecutive SNPs are closer to each other than a set distance threshold (suggested value 250kb, customizable at the column hole of the Metadata and GWAS-specific parameters table), they are grouped into the same locus, while if they are further apart than the distance threshold, they are used to define the boundaries between peaks.
Loci with at least a significant SNPs (suggested value 5x10-8, customizable at the column p_thresh1 of the Metadata and GWAS-specific parameters table) are retained and their boundaries are enlarged by 100kb to fully capture the shape of the association peak.
- For each genomic region, finemapping is performed using SuSiE-RSS and LD calculated from input PLINK files
⚠️ Whenever possible, in sample LD is strongly recommended (especially for molecular omic phenotypes where the explained variance can be very large).⚠️ Be aware that only SNPs in common between the GWAS summary statistics and the LD reference panel are taken into account for fine-mapping, while all other SNPs are discarded (loci for which no SNP overlap is found between the GWAS summary statistics and the LD reference panel are reported in NOT_FINEMAPPED_no_variants_from_locus_in_LD_ref.tsv).⚠️ Be aware that loci fully or partially overlapping the HLA region (GRCh38: chr6:28,510,120-33,480,577) are excluded from fine-mapping. The HLA region is characterized by extremely high variant density, long-range linkage disequilibrium and complex haplotype patterns, which can bias statistical fine-mapping methods and reduce confidence in inferred causal variants.
- Log approximate Bayes factors (lABFs) and metadata for the 99% credible sets are stored in an AnnData object (.h5ad).
| Input | File description |
|---|---|
| Fine-mapping AnnData | An .h5ad file containing lABFs and metadata of credible sets (output from the fine-mapping step) |
- Lists all pairs of credible sets that share at least one SNP (it is not possible for credible sets to colocalize without sharing at least a SNP).
⚠️ If no credible sets share at least one SNP, no colocalization is performed and an empty guide table is produced.
- Performs pair-wise colocalization for pair of credible sets listed in the guide table by employing
iCOLOC, a framework extending traditional colocalization analysis using Bayes Factors by imputing lABFs of SNPs outside of credible sets to the minimum lABF value in the locus.
iCOLOC approach allows to:
- Significantly reducing storage requirements by saving in the AnnData object only exact lABF values of credible sets SNPs
- Enhancing colocalization accuracy compared to tradional coloc by reducing false positives due to two causal SNPs being in strong LD.
| Output Type | Description |
|---|---|
gwas_and_loci_tables/*_dataset_aligned.tsv.gz |
Harmonized (and optionally lifted) GWAS summary statistics |
gwas_and_loci_tables/*_loci.tsv |
Boundaries of identified association regions and GWAS summary statistics for the sentinel SNP |
finemapping_exceptions/ |
Multiple tables reporting information about loci that were not fine-mapped with the standard procedure or at all |
finemapping/*_susie_finemap.rds (optional) |
Individual RDS files for each fine-mapped locus |
anndata/*.h5ad |
AnnData object with lABFs, CS metadata and SNP annotations resulting from fine-mapping |
coloc/coloc_guide_table.csv |
Colocalization analysis guide table, listing all colocalization tests performed |
coloc/*_colocalization.table.*.tsv |
Colocalization analysis results (all, filterd by PPH4 threshold and filtered by PPH3 threshold) |
Developed by the Biostatistics and Genome Analysis Units at Human Technopole