Skip to content

14. Heterozygosity Calculation

George Pacheco edited this page Aug 4, 2021 · 1 revision

Based on Dataset I and using ANGSD--v0.931, we calculate the percentage of heterozygous genotypes of each sample.

Generates a .bed file based on the .mafs file:
zcat ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.mafs.gz | cut -f1,2 | tail -n +2 | awk '{print $1"\t"$2-1"\t"$2}' | bedtools merge -i - > ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.bed
Creates a .pos file based on this new .bed:
awk '{print $1"\t"($2+1)"\t"$3}' ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.bed > ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.pos
Indexs the .pos file created above:
angsd sites index ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.pos
Gets files:
parallel --plus --dryrun angsd -i {} -anc ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta -ref ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta -sites ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.pos -rf ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun_ChrGreater1kb.id -GL 1 -doSaf 1 -fold 1 -remove_bads 1 -uniqueOnly 1 -baq 1 -C 50 -minMapQ 30 -minQ 20 -out ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/{/...} :::: ~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.list | xsbatch -R --max-array-jobs 60 -c 1 --time 10-00 --mem-per-cpu 6024 -J HetCalc --
Gets fractions:
parallel --plus "realSFS {} > ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/{/..}.het" ::: ~/data/Pigeons/PBGP/PBGP--Analyses/Miscellaneous/HeterozygosityCalc/*.saf.idx
Calculates the percentage of heterozygous SITES:
fgrep '.' *.het | tr ":" " " | awk '{print $1"\t"$3/($2+$3)*100}' | gawk '{match($1,/(GBS|WGS|WGS\-GBS)/,lol);print $1"\t"$2"\t"lol[1]}' | sort -k 1,1gr | awk '{split($0,a,"_"); print $1"\t"a[1]"\t"$2"\t"$3'} > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/HeterozygosityCalc/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.Heterozygosity.txt
These results were plotted using the Rscript below:

Clone this wiki locally