Skip to content

Versatile tool for detecting selective sweeps with a variety of ages, strengths, starting allele frequencies, and completeness.

License

Notifications You must be signed in to change notification settings

lauterbur/Flex-sweep

Repository files navigation

Flex-sweep

Versatile tool for detecting selective sweeps with a variety of ages, strengths, starting allele frequencies, and completeness.

Flex-sweep trains a convolutional neural network (CNN) to classify genomic loci as sweep or neutral regions. The workflow starts with simulating data under an appropriate demographic model and estimates for mutation rate, recombination rate, and sweep strength, age, starting allele frequency (softness), and ending allele frequency (completeness), which it does with discoal, however simulated data in ms-format generated by any method can be provided instead.

Flex-sweep can be run entirely within the singularity container without downloading this repository.

Singularity container to run Flex-sweep and two pre-trained models:

https://zenodo.org/record/7860595
Other requirements for plotting:

  • python3
  • R
  • R libraries:
    ++ ggplot2
    ++ tidyverse
    ++ viridis
    ++ yardstick

It creates the following directory structure:

outputDir
      |
      -- training_data
              |
              -- neutral (includes a file for each simulation)
                      |
                      -- stats
                              |
                              -- bins
                              -- ...
              -- sweep (includes a file for each simulation)
                      |
                      -- stats
                              |
                              -- ...
      -- fvs (includes neutral.fv and sweep.fv)
      -- model (includes model files, history, model test predictions)
              |
              -- predictions
      -- data_windows (includes a separate file and subdirectory for each window, and a fv for each window)
              |
              -- window_subdirectories
                      |
                      -- stats
                              |
                              -- ...
      -- classification (will include a predictions file)

Configuration file for making simulation array

An example configuration file is provided.

It is recommended to train Flex-sweep using simulations generated with a wide range of mutation rates, recombination rates, sweep strengths, sweep ages, swept allele starting frequencies, and swept allele ending frequencies. These should be chosen based on reasonable estimates that reflect your species.

Choose a demographic model that represents reasonable expectations for your population.

Changelog 24 April 2023

  • changed order of statistics in feature vector (was: [iHS, nsl, iSAFE, DIND, hapDAF-o, hapDAF-s, highfreq, lowfreq, Sratio, HAF, H12], now: [DIND, HAF, hapDAF-o, iSAFE, highfreq, hapDAF-s, nsl, Sratio, lowfreq, iHS, H12]) reflecting new analyses in revision

  • upload new singularity (apptainer) image to zenodo with new statistic order

  • upload new pre-trained models with new statistic order

  • simulation config file can now take upper and lower bounds for normal distributions of parameters

See wiki for full documentation

About

Versatile tool for detecting selective sweeps with a variety of ages, strengths, starting allele frequencies, and completeness.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published