-
Notifications
You must be signed in to change notification settings - Fork 4
Genotype Likelihoods
Estimate genotype likelihood using ANGSD. Please see ANGSD's tutorial page for full details on this method.
To run this method, use the following command
angsd-wrapper Genotypes Genotype_Config
where Genotype_Config
is the full path to the configuration file for the 2D site frequency spectrum and Fst estimations.
All inputs should be specified in Genotype_Config
.
This method does make use of Common_Config
, those that are used are listed below:
Variable | Function |
---|---|
SAMPLE_LIST GROUP_SAMPLES on dev
|
A list of samples to be used in calculations |
SAMPLE_INBREEDING GROUP_INBREEDING on dev
|
A list of inbreeding coefficients, where each line here corresponds to a line in SAMPLE_LIST or GROUP_SAMPLES on dev
|
ANC_SEQ |
Path to ancestral sequence |
REF_SEQ |
Path to reference sequence |
PROJECT |
Name given to all outputs in ANGSD-wrapper |
SCRATCH |
Place to store files, the full path is SCRATCH/PROJECT/GenotypeLikelihoods
|
REGIONS |
Limit the scope of ANGSD-wrapper to certain regions |
UNIQUE_ONLY |
Use uniquely mapped reads only |
MIN_BASEQUAL |
Minimum base quality score |
MIN_IND |
Minimum number of individuals needed to use this site |
GT_LIKELIHOOD |
Estimate genotype likelihoods |
MIN_MAPQ |
Minimum base mapping quality |
N_CORES |
Number of cores to use, please do not set above the limits of your system |
DO_MAJORMINOR |
Estimate major/minor alleles |
DO_GENO |
Call genotypes and setup the output |
DO_MAF |
Calculate per-site frequencies |
DO_POST |
Estimate the posterior genotype probability |
The only variable is a flag for the desired output format. DO_GLF
determines if the output genotype likelihoods are:
- binary all log genotype likelihood
- beagle genotype likelihood format
- beagle binary format
- text output of all log genotype likelihoods. In the config file, this would look something like
DO_GLF=3 # Sets to beagle binary format
The primary consideration for ANGSD-Wrapper is that the Admixture tool requires DO_GLF=2
to use NGSAdmix, hence why it's set as such in the example configuration files.
The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:
Parameter | Function |
---|---|
POST_CUTOFF |
Floor limit for the posterior probability |
SNP_PVAL |
P-value cutoff for calling SNPs |
Naming Scheme | Contents |
---|---|
PROJECT_SFSOut.[format based on DO_GLF flag].gz |
Genotype likelihoods |