Skip to content

Genotype Likelihoods

Samuel Hamann edited this page Apr 29, 2021 · 4 revisions

Estimate genotype likelihood using ANGSD. Please see ANGSD's tutorial page for full details on this method.

Basic Usage

To run this method, use the following command

angsd-wrapper Genotypes Genotype_Config

where Genotype_Config is the full path to the configuration file for the 2D site frequency spectrum and Fst estimations.

Input files

All inputs should be specified in Genotype_Config.

Common Variables

This method does make use of Common_Config, those that are used are listed below:

Variable Function
SAMPLE_LIST
GROUP_SAMPLES on dev
A list of samples to be used in calculations
SAMPLE_INBREEDING
GROUP_INBREEDING on dev
A list of inbreeding coefficients, where each line here corresponds to a line in SAMPLE_LIST or GROUP_SAMPLES on dev
ANC_SEQ Path to ancestral sequence
REF_SEQ Path to reference sequence
PROJECT Name given to all outputs in ANGSD-wrapper
SCRATCH Place to store files, the full path is SCRATCH/PROJECT/GenotypeLikelihoods
REGIONS Limit the scope of ANGSD-wrapper to certain regions
UNIQUE_ONLY Use uniquely mapped reads only
MIN_BASEQUAL Minimum base quality score
MIN_IND Minimum number of individuals needed to use this site
GT_LIKELIHOOD Estimate genotype likelihoods
MIN_MAPQ Minimum base mapping quality
N_CORES Number of cores to use, please do not set above the limits of your system
DO_MAJORMINOR Estimate major/minor alleles
DO_GENO Call genotypes and setup the output
DO_MAF Calculate per-site frequencies
DO_POST Estimate the posterior genotype probability

Method-Specific Variables

The only variable is a flag for the desired output format. DO_GLF determines if the output genotype likelihoods are:

  1. binary all log genotype likelihood
  2. beagle genotype likelihood format
  3. beagle binary format
  4. text output of all log genotype likelihoods. In the config file, this would look something like
DO_GLF=3  # Sets to beagle binary format

The primary consideration for ANGSD-Wrapper is that the Admixture tool requires DO_GLF=2 to use NGSAdmix, hence why it's set as such in the example configuration files.

Method Parameters

The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:

Parameter Function
POST_CUTOFF Floor limit for the posterior probability
SNP_PVAL P-value cutoff for calling SNPs

Output files

Naming Scheme Contents
PROJECT_SFSOut.[format based on DO_GLF flag].gz Genotype likelihoods