Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Cloufield committed Oct 16, 2024
1 parent 5a9e97e commit 16744dd
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 27 deletions.
76 changes: 54 additions & 22 deletions 30_phasing/Phasing.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,25 +26,12 @@ You could read the following methods if you are interested.

## How to do phasing

In most of the cases, phasing is just a pre-step of imputation, and we do not care about how the phasing goes. But there are several considerations, like reference-based or reference-free, large and small sample size, rare variants' cutoff. There is no single method that could best fit all cases.

In most of the cases, phasing is just a pre-step of imputation, and we do not care about how the phasing goes.
But there are several considerations, like reference-based or reference-free, large and small sample size, rare variants cutoff.
There is no single method that could best fit all cases.

Here I show one example using EAGLE2.

```sh
eagle \
--vcf=target.vcf.gz \
--geneticMapFile=genetic_map_hg19_withX.txt.gz \
--chrom=19 \
--outPrefix=target.eagle \
--numThreads=10
```

!!! quote
Loh, P. R., Danecek, P., Palamara, P. F., Fuchsberger, C., A Reshef, Y., K Finucane, H., ... & L Price, A. (2016). Reference-based phasing using the Haplotype Reference Consortium panel. Nature genetics, 48(11), 1443-1448.
We will show two examples:

- Reference-based phasing using SHAPEIT2
- Cohort-based phasing using Eagle2

## Phasing using SHAPEIT2

Expand Down Expand Up @@ -82,10 +69,11 @@ plink2 \
Then, it is needed to check the alignment of SNPs between your genotype file and the reference file.

```
outputhaps=./1KG.EAS.chr22.phased.haps
outputsample=./1KG.EAS.chr22.phased.sample
outputlog=./1KG.EAS.chr22.phased.log
outputlogcheck=./1KG.EAS.chr22.check
out=./1KG.JPT.chr22.phased.shapeit2.reference_based
outputhaps=${out}.haps
outputsample=${out}.sample
outputlog=${out}
outputlogcheck=${out}.check
geneticmap=~/tools/shapeit2/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/reference/ALL.integrated_phase1_SHAPEIT_16-06-14.nomono/genetic_map_chr22_combined_b37.txt
inputrefhap=~/tools/shapeit2/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/reference/ALL.integrated_phase1_SHAPEIT_16-06-14.nomono/ALL.chr22.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nomono.haplotypes.gz
Expand Down Expand Up @@ -123,7 +111,7 @@ shapeit --input-bed ${inputbedchr22} \
This command will generate a haplotype file `.hap` and a sample file `.sample`. We need to convert the files to VCF, compress and index the VCF for downstream analysis.

```
outputvcf=./1KG.EAS.chr22.phased.vcf
outputvcf=${out}.vcf
shapeit \
-convert \
Expand Down Expand Up @@ -162,11 +150,55 @@ NA19085 NA19086 NA19087 NA19088 NA19089 NA19090 NA19091
1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1
```

## Phasing using Eagle

Next, we will use Eagle2 as an example for cohort-based phasing.

!!! quote
Loh, P. R., Danecek, P., Palamara, P. F., Fuchsberger, C., A Reshef, Y., K Finucane, H., ... & L Price, A. (2016). Reference-based phasing using the Haplotype Reference Consortium panel. Nature genetics, 48(11), 1443-1448.

Download Eagle from [Eagle2 website](https://alkesgroup.broadinstitute.org/Eagle/). Unzip and add it to your environment. We also need genetic maps, which are also available on [Eagle2 website](https://alkesgroup.broadinstitute.org/Eagle/)

Cohort-based phasing (without reference) using eagle2. Eagle2

```
inputbedchr22=./sample_data.chr22.clean
geneticmap=~/tools/eagle/genetic_map_hg19_withX.txt.gz
out=./1KG.JPT.chr22.phased.eagle2.cohort_based
eagle \
--bfile=${inputbedchr22} \
--geneticMapFile=${geneticmap} \
--outPrefix=${out} \
--maxMissingPerSnp=1 \
--maxMissingPerIndiv=1 \
--numThreads=4 \
--chrom=22
```

Use shapeit2 to convert `.hap` and `.sample` to VCF

```
outputhaps=${out}.haps.gz
outputsample=${out}.sample
outputvcf=${out}.vcf
shapeit \
-convert \
--input-haps ${outputhaps} ${outputsample} \
--output-vcf ${outputvcf}
bgzip ${outputvcf} && \
tabix -p vcf ${outputvcf}.gz
```


```
##fileformat=VCFv4.1
##fileDate=16102024_17h46m51s
##source=SHAPEIT2.v904
##log_file=shapeit_16102024_17h46m51s_16aef2b1-de11-4c78-b8a6-f15bf1c2d14b.log
##FORMAT=<ID=GT,Number=1,Type=String,Description="Phased Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA18939 NA18940 NA18941 NA18942 NA18943 NA18944NA18945 NA18946 NA18947 NA18948 NA18949 NA18950 NA18951 NA18952 NA18953 NA18954 NA18956 NA18957 NA18959 NA18960 NA18961NA18962 NA18964 NA18965 NA18966 NA18967 NA18968 NA18969 NA18970 NA18971 NA18972 NA18973 NA18974 NA18975 NA18976 NA18977NA18978 NA18979 NA18980 NA18981 NA18982 NA18983 NA18984 NA18985 NA18986 NA18987 NA18988 NA18989 NA18990 NA18991 NA18992NA18993 NA18994 NA18995 NA18997 NA18998 NA18999 NA19000 NA19001 NA19002 NA19003 NA19004 NA19005 NA19006 NA19007 NA19009NA19010 NA19011 NA19012 NA19054 NA19055 NA19056 NA19057 NA19058 NA19059 NA19060 NA19062 NA19063 NA19064 NA19065 NA19066NA19067 NA19068 NA19070 NA19072 NA19074 NA19075 NA19076 NA19077 NA19078 NA19079 NA19080 NA19081 NA19082 NA19083 NA19084NA19085 NA19086 NA19087 NA19088 NA19089 NA19090 NA19091
22 16051453 22:16051453:A:C C A . PASS . GT 1|1 1|1 1|1 1|1 1|1 1|1 0|1 1|1 1|1 1|1 1|1 1|0 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 0|1 1|1 0|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 0|1 1|1 0|1 1|1 1|1 1|1 1|1 1|1 0|1 1|1 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 0|1 0|1 1|1 1|1 1|1 0|1 1|1 1|1 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1
```
28 changes: 28 additions & 0 deletions 30_phasing/phasing_eagle2.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
inputbed=../04_Data_QC/sample_data.clean
jptsample=../01_Dataset/JPT.sample
inputbedchr22=./sample_data.chr22.clean

geneticmap=~/tools/eagle/genetic_map_hg19_withX.txt.gz
out=./1KG.JPT.chr22.phased.eagle2.cohort_based

eagle \
--bfile=${inputbedchr22} \
--geneticMapFile=${geneticmap} \
--outPrefix=${out} \
--maxMissingPerSnp=1 \
--maxMissingPerIndiv=1 \
--numThreads=4 \
--chrom=22

outputhaps=${out}.haps.gz
outputsample=${out}.sample
outputvcf=${out}.vcf

shapeit \
-convert \
--input-haps ${outputhaps} ${outputsample} \
--output-vcf ${outputvcf}

bgzip ${outputvcf} && \
tabix -p vcf ${outputvcf}.gz
11 changes: 6 additions & 5 deletions 30_phasing/phasing_shapeit2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ plink2 \

geneticmap=~/tools/shapeit2/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/reference/ALL.integrated_phase1_SHAPEIT_16-06-14.nomono/genetic_map_chr22_combined_b37.txt

outputhaps=./1KG.EAS.chr22.phased.haps
outputsample=./1KG.EAS.chr22.phased.sample
outputlog=./1KG.EAS.chr22.phased.log
outputlogcheck=./1KG.EAS.chr22.check
out=./1KG.JPT.chr22.phased.shapeit2.reference_based
outputhaps=${out}.haps
outputsample=${out}.sample
outputlog=${out}
outputlogcheck=${out}.check


inputrefhap=~/tools/shapeit2/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/reference/ALL.integrated_phase1_SHAPEIT_16-06-14.nomono/ALL.chr22.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nomono.haplotypes.gz
Expand Down Expand Up @@ -45,7 +46,7 @@ shapeit --input-bed ${inputbedchr22} \
--states 200 \
--window 2

outputvcf=./1KG.EAS.chr22.phased.vcf
outputvcf=${out}.vcf
shapeit \
-convert \
--input-haps ${outputhaps} ${outputsample} \
Expand Down

0 comments on commit 16744dd

Please sign in to comment.