-
Notifications
You must be signed in to change notification settings - Fork 1
2. Running GSUB
Yuchang Wu edited this page Sep 26, 2024
·
4 revisions
Run the following Rscript inside of the GSUB directory.
Rscript GSUB.R \
--sumstat_files <path to sumstat files for Y1 and Y2> \
--output_path <path to output folder> \
--correction <correction to apply> \
--N <GWAS sample size> \
--info.filter <imputation quality filter> \
--maf.filter <minor allele frequency filter> \
--sample.prev <sample prevalence> \
--population.prev <population prevalence> \
--se.logit <SEs are log scale> \
--OLS <phenotype continuous outcome> \
--linprob <GWAS was linear regression> \
--keep.indel <keep insertions deletions> \
--hm3 <path to hm3 file> \
--ld <path to ld folder> \
--wld <path to wld folder> \
--ref <path to file with reference SNPs>
where the inputs are
-
sumstat_files(required): paths to 2 sumstat files. Files should be ordered trait 1,trait 2.$\gamma_2$ is the parameter of interest: the SNP effect on the latent factor for trait 2 after regressing out the factor for trait 1. Sumstats files must be formatted in accordance with the specifications of GenomicSEM (see Sumstats settings for GenomicSEM). -
output_path(required): Path to folder which will contain output and logs from GSUB. -
correction: Correction to apply to GSUB's analytical solution representing the level of genomic control. The default is "standard" which adjusts the univariate GWAS standard errors by multiplying them by the square root of the univariate LDSC intercept. Other options are "conserv" which corrects standard errors using the univariate LDSC intercept, and "none" which does not correct the standard errors. -
N: GWAS sample size. If you want to use the sample size column in the sumstats file, specify N as NA. If the GWAS is meta'ed case-control studies, N should be the sum of effective sample sizes from each cohort. Default is NA, NA. -
info.filter: Numeric value which is used as a lower bound for imputation quality. If you don't want to filter based on INFO, you can set info.filter to 0. Default is 0.9. -
maf.filter: Numeric value used as a lower bound for minor allele frequency. If you don't want to filter based on MAF, you can set maf.filter to 0. Default is 0.01. -
sample.prev: Sample prevalence. Set to NA for linear regression results. If N is the effective sample size for case-control GAWS, then sample prevalence is 0.5. Default is NA, NA. -
population.prev: Population prevalence. Set to NA for linear regression results. Default is NA, NA. -
se.logit: Whether the SEs are on a logistic scale (i.e. from a logistic regression). Default is TRUE, TRUE. -
OLS: Whether the phenotype was a continuous outcome analyzed using an observed least square (OLS; i.e., linear) estimator. Default is FALSE, FALSE. -
linprob: Whether GWAS was a linear regression or a binary outcome. Default is FALSE, FALSE. -
keep.indel: Whether insertion deletions (indels) should be included in the output. Default is FALSE.
NOTE: We recommend that you don't pass the rest of the inputs unless you have a reason to pass your own files.
-
hm3: A file of SNPs with A1, A2 and rsID used to align alleles across traits. -
ld: A folder (or folders) of partitioned LD scores. -
wld: A folder of non-partitioned LD scores used as regression weights. -
ref: A reference file of SNPs to keep in your GWAS.
See a real example of how to run GSUB.
- The inputs
sumstat_files,N,sample.prev,population.prev,se.logit,OLS, andlinprobmust be comma-separated character strings with a length of 2 (ex: --sample.prev 0.5,0.5 --se.logit FALSE,FALSE). Values must be specified for each sumstats file. If you do not provide these inputs,GSUBwill still work but will use default values for optional inputs. - The inputs
sumstat_filesandoutput_pathare required. You can run GSUB with only these inputs and other inputs will be their default values. However, the default behavior assumes the GWAS sumstat files are a logistic regression on a 0/1 binary outcome. - Settings for common GWAS
- Linear regression on a continuous train: se.logit = FALSE, OLS = TRUE, linprob = FALSE
- Linear regression on 0/1 binary outcome: se.logit = FALSE, OLS = FALSE, linprob = TRUE
- Logistic regression on 0/1 binary outcome: se.logit = TRUE, OLS = FALSE, linprob = FALSE