Versatile tool for detecting selective sweeps with a variety of ages, strengths, starting allele frequencies, and completeness.
Flex-sweep trains a convolutional neural network (CNN) to classify genomic loci as sweep or neutral regions. The workflow starts with simulating data under an appropriate demographic model and estimates for mutation rate, recombination rate, and sweep strength, age, starting allele frequency (softness), and ending allele frequency (completeness), which it does with discoal, however simulated data in ms-format generated by any method can be provided instead.
Flex-sweep can be run entirely within the singularity container without downloading this repository.
https://zenodo.org/record/7860595
Other requirements for plotting:
- python3
- R
- R libraries:
++ ggplot2
++ tidyverse
++ viridis
++ yardstick
It creates the following directory structure:
outputDir
|
-- training_data
|
-- neutral (includes a file for each simulation)
|
-- stats
|
-- bins
-- ...
-- sweep (includes a file for each simulation)
|
-- stats
|
-- ...
-- fvs (includes neutral.fv and sweep.fv)
-- model (includes model files, history, model test predictions)
|
-- predictions
-- data_windows (includes a separate file and subdirectory for each window, and a fv for each window)
|
-- window_subdirectories
|
-- stats
|
-- ...
-- classification (will include a predictions file)
An example configuration file is provided.
It is recommended to train Flex-sweep using simulations generated with a wide range of mutation rates, recombination rates, sweep strengths, sweep ages, swept allele starting frequencies, and swept allele ending frequencies. These should be chosen based on reasonable estimates that reflect your species.
Choose a demographic model that represents reasonable expectations for your population.
-
changed order of statistics in feature vector (was: [iHS, nsl, iSAFE, DIND, hapDAF-o, hapDAF-s, highfreq, lowfreq, Sratio, HAF, H12], now: [DIND, HAF, hapDAF-o, iSAFE, highfreq, hapDAF-s, nsl, Sratio, lowfreq, iHS, H12]) reflecting new analyses in revision
-
upload new singularity (apptainer) image to zenodo with new statistic order
-
upload new pre-trained models with new statistic order
-
simulation config file can now take upper and lower bounds for normal distributions of parameters