Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mishaploid authored Sep 24, 2020
1 parent 60719d9 commit 68d7678
Showing 1 changed file with 20 additions and 15 deletions.
35 changes: 20 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,16 @@ Before running Snakemake, download the **Arabidopsis Regional Mapping (RegMap) d
`wget https://github.com/Gregor-Mendel-Institute/atpolydb/blob/master/250k_snp_data/call_method_75.tar.gz`
`tar -xvf call_method_75.tar.gz`

- **data/raw/gwas_results.RData**
GWAS results from [Angelovici et al. 2013](http://www.plantcell.org/content/25/12/4827#sec-12)
### Phenotype and covariate data

### Phenotype data
- **data/raw/aa360_raw_ratios.csv**
Raw measurements (nmol/mg seed) of 65 free amino acid traits measured in 313 accessions of _Arabidopsis thaliana_ as reported by [Angelovici et al. 2013](http://www.plantcell.org/content/25/12/4827#sec-12)

- **data/raw/pheno_file**
Environment corrected BLUPs for 65 free amino acid traits measured in 313 accessions of _Arabidopsis thaliana_ as reported by [Angelovici et al. 2013](http://www.plantcell.org/content/25/12/4827#sec-12)
- **data/processed/aa360_BLUEs.txt**
Environment adjusted best linear unbiased estimates (BLUEs) for 65 free amino acid traits. Calculated using the `HAPPI-GWAS` pipeline from [Slaten et al. 2020](https://doi.org/10.1093/bioinformatics/btaa589). Check out `notebooks/01-calculate_BLUEs.Rmd` for details.

- **data/raw/pheno_file_pcs**
Phenotypes adjusted for population structure
- **data/processed/aa360_covars.txt**
Principal components from genotype data to model population structure.

Snakemake
------------
Expand All @@ -56,10 +56,11 @@ rules/common.smk - specifies location of config.yaml file
- filter and convert genotype data to PED format
- exports TAIR 10 ensembl gene ids for SNP data
- calculate SNP weightings (these aren't actually used, could skip)
- run PCA and exports PC adjusted phenotype file
- run PCA and exports covariate file (including two PCs here - recommend adjusting depending on data)

#### rules/cross_validation.smk
- create training and testing sets for cross validation
- Note: may need to run this separately on command line, repeating the command via loops/snakemake sometimes does not generate unique folds

#### rules/gblup.smk
- export kinship matrix for all SNPs
Expand All @@ -75,25 +76,29 @@ rules/common.smk - specifies location of config.yaml file
- summarize MultiBLUP output (`reports/multiblup.RData`)

#### rules/null_distribution.smk
- generate 5000 random gene groups with a uniform distribution of SNPs
- first option: generate 5000 random gene groups with a uniform distribution of SNPs. This is useful if you want to examine influences of partition size on heritability explained/model fit or if you are looking to compare a lot of different partitions with varying size to an empirical distribution (output `reports/lr_null_results.csv`)
- second option: generate 1000 random gene groups for each pathway (excludes pathway SNPs and samples a similar number of SNPs/genes)
- calculate kinship matrices for each random group
- estimate variances and heritability for each random SNP set
- summarize results across all 5000 gene groups (`reports/lr_null_results.csv`)
- summarize results across all 1000 gene groups (`reports/null_dist_results_pathways/{pathway}_null.csv`)

Notebooks
------------
R notebooks to summarize results.

#### 01-gblup_results.Rmd
#### 01-calculate_BLUEs.Rmd


#### 02-gblup_results.Rmd
Checks quality of model output and summarizes prediction results for the GBLUP and MultiBLUP models (e.g. proportion of heritability explained, likelihood ratio, prediction accuracy, reliability, bias, MSE)

#### 02-process_null.Rmd
#### 03-process_null.Rmd
Examines properties of the random gene groups (e.g. distribution of likelihood ratio) and performs quantile regression to establish 95 percentiles for the proportion of heritability explained and likelihood ratio (see nice discussion of this approach in [Edwards et al. 2016](https://gsejournal.biomedcentral.com/articles/10.1186/s12711-015-0132-6))

#### 03-multiblup_results.Rmd
Identifies pathways that pass significance criteria based on comparison to random gene groups with the same number of SNPs (proportion of h<sup>2</sup>, likelihood ratio, and at least a 1% increase in prediction accuracy).
#### 04-multiblup_results_by_pathway.Rmd
Identifies pathways that pass significance criteria based on comparison to random gene groups with the same number of SNPs (proportion of h<sup>2</sup>, likelihood ratio, and increase in prediction accuracy).

#### 04-figures.Rmd
#### 05-figures.Rmd
Code to create figures used in the manuscript.


Expand Down

0 comments on commit 68d7678

Please sign in to comment.