diff --git a/genotype_array_qc/README.md b/genotype_array_qc/README.md
index de2414e..ce90d29 100644
--- a/genotype_array_qc/README.md
+++ b/genotype_array_qc/README.md
@@ -15,32 +15,15 @@ The input and output formats are fully described in the appendix of this documen
The steps in this workflow are as follows:
-1. Add sex to fam file
+1. Split the X chromosome into PAR and non-PAR
Sample command:
-``` shell
-# Create sex mapping file from phenotype file
-perl -lne '
- BEGIN {
- $header = 1;
- $fidCol = -1;
- $iidCol = -1;
- $sexCol = -1;
- }
- $delimiter = lc("'[PHENO_DELIMITER]'");
- $delimiter = ($delimiter eq "comma") ? "," : (($delimiter eq "tab") ? "\t" : (($delimiter eq "space") ? " " : ""));
- @F = split($delimiter);
- if ($header) {
- foreach $col (@F) {
-
- }
- }
-'
+```
plink \
--bfile [INPUT_BED_BIM_FAM_PREFIX] \
- --update-sex [SEX_FILE] \
+ --split-x b37 no-fail \
--make-bed \
- --out /shared/data/studies/vidus/observed/processing/ea/vidus.ea.chr23.snp_miss.with_cidr_sexes
+ --out [OUTPUT_BED_BIM_FAM_PREFIX]
```
Input Files:
@@ -67,62 +50,40 @@ Parameters:
| PARAMETER | DESCRIPTION |
| --- | --- |
| `--bfile [INPUT_BED_BIM_FAM_PREFIX]` | Prefix for input genotypes in PLINK bed/bim/fam format |
-| `--chr [CHR]` | Chromosome to extract (1-26, X, Y, XY, MT) |
+| `--split-x b37 no-fail` | Option telling PLINK to split X based on b37 coordinates and not fail if already split |
| `--make-bed` | Flag indicating to generate genotypes in PLINK bed/bim/fam format |
| `--out [OUTPUT_BED_BIM_FAM_PREFIX]` | Prefix for output genotypes in PLINK bed/bim/fam format |
-
-2. Sex check
+2. Remove phenotype info in FAM file
Sample command:
-``` shell
-# Run sex check
-plink \
- --bfile [INPUT_BED_BIM_FAM_PREFIX] \
- --check-sex \
- --out [OUTPUT_PREFIX]
-
-# Rename output file
-perl -lane 'print join("\t",@F);' [OUTPUT_PREFIX].sexcheck > [OUTPUT_PREFIX].sexcheck.all.tsv
-
-# Extract subjects not passing sex check
-head -n 1 [OUTPUT_PREFIX].sexcheck.all.tsv > [OUTPUT_PREFIX].sexcheck.problems.tsv
-grep PROBLEM [OUTPUT_PREFIX].sexcheck.all.tsv >> [OUTPUT_PREFIX].sexcheck.problems.tsv
-
-# Create remove list
-tail -n +2 [OUTPUT_PREFIX].sexcheck.problems.tsv |
- perl -lane 'print join("\t", $F[0], $F[1]);' > [OUTPUT_PREFIX].sexcheck.remove.tsv
+```
+perl -lane 'print join("\t", @F[0 .. 3], "0\t0");' [INPUT_FAM_FILE]
```
Input Files:
| FILE | DESCRIPTION |
| --- | --- |
-| `[INPUT_BED_BIM_FAM_PREFIX].bed` | PLINK format bed file for input genotypes |
-| `[INPUT_BED_BIM_FAM_PREFIX].bim` | PLINK format bim file for input genotypes |
-| `[INPUT_BED_BIM_FAM_PREFIX].fam` | PLINK format fam file for input genotypes |
+| `[INPUT_FAM_FILE]` | Input FAM file to remove phenotype info from |
Output Files:
| FILE | DESCRIPTION |
| --- | --- |
-| `[OUTPUT_PREFIX].sexcheck.all.tsv` | PLINK sex check output for all subjects |
-| `[OUTPUT_PREFIX].sexcheck.problems.tsv` | PLINK sex check output for subjects not passing sex check |
-| `[OUTPUT_PREFIX].sexcheck.remove.tsv` | List of subjects not passing sex check that can be fed into PLINK to remove the subjects |
+| `[OUTPUT_FAM_FILE]` | Output FAM file phenotype info removed |
Parameters:
| PARAMETER | DESCRIPTION |
| --- | --- |
-| `--bfile [INPUT_BED_BIM_FAM_PREFIX]` | Prefix for input genotypes in PLINK bed/bim/fam format |
-| `--sex-check` | Flag indicating that PLINK shoud perform a sex check |
-| `--out [OUTPUT_BED_BIM_FAM_PREFIX]` | Prefix for output genotypes in PLINK bed/bim/fam format |
-
+| `--in_fam [INPUT_FAM_FILE]` | Input FAM file to remove phenotype info from |
+| `--out_fam [OUTPUT_FAM_FILE]` | Output FAM file phenotype info removed |
@@ -165,7 +126,6 @@ Parameters:
| `--chr [CHR]` | Chromosome to extract (1-26, X, Y, XY, MT) |
| `--make-bed` | Flag indicating to generate genotypes in PLINK bed/bim/fam format |
| `--out [OUTPUT_BED_BIM_FAM_PREFIX]` | Prefix for output genotypes in PLINK bed/bim/fam format |
-
@@ -346,101 +306,60 @@ Parameters:
-6. Flag individuals missing chrX or other chromosome
-
-
-
-
-
-7. Remove phenotype info in FAM file
-
-Sample command:
-```
-perl -pe 's/\S+$/0/;' [INPUT_FAM_FILE]
-```
-
-Input Files:
-
-| FILE | DESCRIPTION |
-| --- | --- |
-| `[INPUT_FAM_FILE]` | Input FAM file to remove phenotype info from |
-
-
-Output Files:
-
-| FILE | DESCRIPTION |
-| --- | --- |
-| `[OUTPUT_FAM_FILE]` | Output FAM file phenotype info removed |
-
-
-Parameters:
-
-| PARAMETER | DESCRIPTION |
-| --- | --- |
-| `--in_fam [INPUT_FAM_FILE]` | Input FAM file to remove phenotype info from |
-| `--out_fam [OUTPUT_FAM_FILE]` | Output FAM file phenotype info removed |
-
-
-
-
-8. Format phenotype data to standard format
+6. Remove subjects with >99% missingness
Sample command:
``` shell
+plink \
+ --bfile [INPUT_BED_BIM_FAM_PREFIX] \
+ --mind 0.99 \
+ --make-bed \
+ --out [OUTPUT_BED_BIM_FAM_PREFIX]
```
Input Files:
| FILE | DESCRIPTION |
| --- | --- |
+| `[INPUT_BED_BIM_FAM_PREFIX].bed` | PLINK format bed file for input genotypes |
+| `[INPUT_BED_BIM_FAM_PREFIX].bim` | PLINK format bim file for input genotypes |
+| `[INPUT_BED_BIM_FAM_PREFIX].fam` | PLINK format fam file for input genotypes |
Output Files:
| FILE | DESCRIPTION |
| --- | --- |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].bed` | PLINK format bed file for output genotypes |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].bim` | PLINK format bim file for output genotypes |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].fam` | PLINK format fam file for output genotypes |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].log` | PLINK log file |
Parameters:
| PARAMETER | DESCRIPTION |
| --- | --- |
+| `--bfile [INPUT_BED_BIM_FAM_PREFIX]` | Prefix for input genotypes in PLINK bed/bim/fam format |
+| `--mind 0.99` | Option indicating that individuals with >99% missingness should be excluded |
+| `--make-bed` | Flag indicating to generate genotypes in PLINK bed/bim/fam format |
+| `--out [OUTPUT_BED_BIM_FAM_PREFIX]` | Prefix for output genotypes in PLINK bed/bim/fam format |
-9. Structure workflow (separate supporting workflow)
-
-Sample command:
-``` shell
-```
-
-Input Files:
-
-| FILE | DESCRIPTION |
-| --- | --- |
-
+7. Structure workflow (separate supporting workflow)
-Output Files:
-
-| FILE | DESCRIPTION |
-| --- | --- |
-
-
-Parameters:
-
-| PARAMETER | DESCRIPTION |
-| --- | --- |
-10. Partition data by ancestry
+8. Partition data by ancestry
Sample command:
``` shell
plink \
- --bfile [INPUT_BED_BIM_FAM_PREFIX] \ \
+ --bfile [INPUT_BED_BIM_FAM_PREFIX] \
--keep [KEEP_LIST] \
--make-bed \
--out [OUTPUT_BED_BIM_FAM_PREFIX]
@@ -478,12 +397,12 @@ Parameters:
-11. Call rate filter
+9. Call rate filter
Sample command:
``` shell
plink \
- --bfile [INPUT_BED_BIM_FAM_PREFIX] \ \
+ --bfile [INPUT_BED_BIM_FAM_PREFIX] \
--geno [CALL_RATE_THRESHOLD] \
--make-bed \
--out [OUTPUT_BED_BIM_FAM_PREFIX]
@@ -520,7 +439,7 @@ Parameters:
-12. HWE filter
+10. HWE filter
Sample command:
``` shell
@@ -596,7 +515,7 @@ Parameters:
-13. Set het haploids to missing
+11. Set het haploids to missing
Sample command:
``` shell
@@ -638,7 +557,7 @@ Parameters:
-14. Subject call rate filter (based on autosomes)
+12. Subject call rate filter (based on autosomes)
Sample command:
``` shell
@@ -692,7 +611,7 @@ Parameters:
-15. Relatedness workflow (separate supporting workflow)
+13. Excessive homozygosity filtering
Sample command:
``` shell
@@ -718,33 +637,62 @@ Parameters:
-16. Excessive homozygosity filtering
+14. Relatedness workflow (separate supporting workflow)
+
+
+
+
+
+15. Sex check workflow (separate supporting workflow)
+
+
+
+
+
+16. Remove samples based on relatedness (optional)
Sample command:
``` shell
+plink \
+ --bfile [INPUT_BED_BIM_FAM_PREFIX] \
+ --remove [REMOVE_LIST] \
+ --make-bed \
+ --out [OUTPUT_BED_BIM_FAM_PREFIX]
```
Input Files:
| FILE | DESCRIPTION |
| --- | --- |
+| `[INPUT_BED_BIM_FAM_PREFIX].bed` | PLINK format bed file for input genotypes |
+| `[INPUT_BED_BIM_FAM_PREFIX].bim` | PLINK format bim file for input genotypes |
+| `[INPUT_BED_BIM_FAM_PREFIX].fam` | PLINK format fam file for input genotypes |
+| `[REMOVE_LIST]` | List of subjects to remove |
Output Files:
| FILE | DESCRIPTION |
| --- | --- |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].bed` | PLINK format bed file for output genotypes |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].bim` | PLINK format bim file for output genotypes |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].fam` | PLINK format fam file for output genotypes |
+| `[OUTPUT_BED_BIM_FAM_PREFIX].log` | PLINK log file |
Parameters:
| PARAMETER | DESCRIPTION |
| --- | --- |
+| `--bfile [INPUT_BED_BIM_FAM_PREFIX]` | Prefix for input genotypes in PLINK bed/bim/fam format |
+| `--remove [REMOVE_LIST]` | List of subjects to remove |
+| `--make-bed` | Flag indicating to generate genotypes in PLINK bed/bim/fam format |
+| `--out [OUTPUT_BED_BIM_FAM_PREFIX]` | Prefix for output genotypes in PLINK bed/bim/fam format |
-17. Remove samples based on relatedness (optional)
+17. Remove samples based on discrepant sex (optional)
Sample command:
``` shell
@@ -787,13 +735,13 @@ Parameters:
-18. Remove samples based on discrepant sex (optional)
+18. Merge the PAR and non-PAR regions of the X chromosome
Sample command:
-``` shell
+```
plink \
--bfile [INPUT_BED_BIM_FAM_PREFIX] \
- --remove [REMOVE_LIST] \
+ --merge-x no-fail \
--make-bed \
--out [OUTPUT_BED_BIM_FAM_PREFIX]
```
@@ -805,7 +753,6 @@ Input Files:
| `[INPUT_BED_BIM_FAM_PREFIX].bed` | PLINK format bed file for input genotypes |
| `[INPUT_BED_BIM_FAM_PREFIX].bim` | PLINK format bim file for input genotypes |
| `[INPUT_BED_BIM_FAM_PREFIX].fam` | PLINK format fam file for input genotypes |
-| `[REMOVE_LIST]` | List of subjects to remove |
Output Files:
@@ -823,10 +770,16 @@ Parameters:
| PARAMETER | DESCRIPTION |
| --- | --- |
| `--bfile [INPUT_BED_BIM_FAM_PREFIX]` | Prefix for input genotypes in PLINK bed/bim/fam format |
-| `--remove [REMOVE_LIST]` | List of subjects to remove |
+| `--merge-x no-fail` | Option telling PLINK to merge the PAR and non-PAR regions and not fail if already split |
| `--make-bed` | Flag indicating to generate genotypes in PLINK bed/bim/fam format |
| `--out [OUTPUT_BED_BIM_FAM_PREFIX]` | Prefix for output genotypes in PLINK bed/bim/fam format |
+
+19. Flag individuals missing chrX or other chromosome
+
+
+
+