Skip to content

3. Real example for GSUB

Yuchang Wu edited this page Sep 26, 2024 · 9 revisions

Here we provide a concrete example to run GSUB. We use Alzheimer's disease GWAS summary statistics from Kunkle et al. 2019 and Alzheimer's disease proxy GWAS (GWAX) in the UK Biobank as inputs. The main goal is to estimate genetic associations with the non-disease factor (Fnon) underlying parental disease history.

GSUB workflow

Step 1: Download the example data

Ensure you've installed R and downloaded the reference SNP file as described in 1. Preparation for GSUB

Make a folder for the example data:

mkdir example
cd example

Get Kunkle et al. 2019 AD GWAS data:

wget ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/GSUB/ref/Kunkle_etal_Stage1_results.txt.gz

Get AD proxy GWAS (GWAX) data from the UK Biobank:

wget ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/GSUB/ref/ParentalAD.annotated.regenie.logistic.gz

Step 2: Run GSUB

Go back into the GSUB directory:

cd ..

Make a folder for the output:

mkdir output

Run GSUB:
NOTE: Ensure your sumstat_files are ordered correctly (ordering for this example is AD, AD family history. Here we're regressing out AD from AD family history)

Rscript GSUB.R \
--sumstat_files ./example/Kunkle_etal_Stage1_results.txt.gz,./example/ParentalAD.annotated.regenie.logistic.gz \
--output_path ./output \
--correction standard \
--N NA,NA \
--info.filter 0.9 \
--maf.filter 0.01 \
--sample.prev 0.5,0.5 \
--population.prev 0.05,0.05 \
--se.logit TRUE,TRUE \
--OLS FALSE,FALSE \
--linprob FALSE,FALSE \
--keep.indel FALSE

Your final output file will be written to ./output/gsub_analytical_all.txt.gz. You can decompress this file by running:

gzip -d ./output/gsub_analytical_all.txt.gz

The first lines of the results look like this:

head ./output/gsub_analytical_all.txt
SNP	        CHR	BP	MAF	        A1	A2	beta.1	                se.1	                beta.2	             se.2	lambda11	lambda12	lambda22	gamma1	se.gamma1	z.gamma1	p.gamma1	gamma2	se.gamma2	z.gamma2	p.gamma2	n.gamma1	n.gamma2
rs2073813	1	753541	0.12326	        G	A	0.0273988230917198	0.0167269369043677	-0.00814744556061092	0.0121313760917577	0.272401142013132	0.247074016559996	0.0923095237748176	0.100582629313643	0.0614055314920866	1.63800600482718	0.101420441152563	-0.357479904909877	0.209771076607855	-1.70414296713626	0.0883543810972295	2259	143
rs3131969	1	754182	0.128231	A	G	-0.021114844271081	0.0158293416307772	0.00885892881217973	0.0120786084939729	0.272401142013132	0.247074016559996	0.0923095237748176	-0.0775137876259823	0.0581104084725682	-1.33390539945307	0.182234861794909	0.303441839089534	0.202605128584048	1.49770068117331	0.134211034799065	2259	143
rs3131968	1	754192	0.128231	A	G	-0.0210597196031535	0.0158293457450882	0.00876749885596263	0.012083623743108	0.272401142013132	0.247074016559996	0.0923095237748176	-0.0773114218520357	0.0581104235764369	-1.33042261773126	0.183379066581056	0.301909718946163	0.202639981332277	1.48988228759807	0.136255189577119	2259	143
rs3131967	1	754334	0.128231	T	C	-0.0244222415420914	0.0164465212864346	0.00892911112528705	0.0120798600659029	0.272401142013132	0.247074016559996	0.0923095237748176	-0.0896554300822796	0.0603761098976662	-1.48494876921087	0.13755739564583	0.336700234939449	0.207280687365238	1.62436857586336	0.10429716953725	2259	143
rs3115858	1	755890	0.127237	A	T	-0.0233749267615418	0.0157169064689343	0.00987063083974022	0.0120057795272785	0.272401142013132	0.247074016559996	0.0923095237748176	-0.0858106782842158	0.0576976526338374	-1.48724730326189	0.13694950679366	0.33660903573674	0.201256068348843	1.67254104931279	0.0944176803599004	2259	143
rs3131962	1	756604	0.130219	A	G	-0.0181931714710897	0.0149875404900784	0.0088865044922004	0.0119693456761731	0.272401142013132	0.247074016559996	0.0923095237748176	-0.0667881615203091	0.0550201088707472	-1.21388639337678	0.224791109487728	0.275032551133222	0.195581670782613	1.40622866157493	0.159656235843049	2259	143
rs6699990	1	756912	0.0238569	A	G	0.0766242375784983	0.0651640729026957	0.0088865044922004	0.0119693456761731	0.272401142013132	0.247074016559996	0.0923095237748176	0.281291910203535	0.239220997463933	1.17586630431949	0.239648305686455	-0.656632329026881	0.65246566761244	-1.0063860240029	0.314229913967701	NA	NA
rs3115853	1	757640	0.129225	G	A	-0.0238710104132891	0.016446563317669	0.00999843486818235	0.0119661269475782	0.272401142013132	0.247074016559996	0.0923095237748176	-0.0876318294294756	0.060376264196705	-1.45142848096683	0.146660583797552	0.342868012416911	0.206511631861101	1.66028426257134	0.0968572801449506	2259	143
rs4951929	1	757734	0.127237	C	T	-0.0205636021088313	0.0157171220859619	0.00912605042453765	0.0119808471082676	0.272401142013132	0.247074016559996	0.0923095237748176	-0.0754901464687692	0.0576984441761415	-1.30835670782237	0.190752349230356	0.300919157497124	0.20108459284654	1.49648042765153	0.134528504562711	2259	143

The output columns:

  • beta.1, se.1: the effect size and standard error from the 1st input summary statistics.
  • beta.2, se.2: the effect size and standard error from the 2nd input summary statistics.
  • gamma1, se.gamma1, z.gamma1, p.gamma1: the summary statistics for $\text{F}_\text{AD}$. This is essentially the summary statistics of the 1st input.
  • gamma2, se.gamma2, z.gamma2, p.gamma2: the summary statistics for $\text{F}_\text{non}$. This is what we're interested in.
  • n.gamma1, n.gamma2: the effective sample sizes. They were computed according to GenomicSEM recommendations.

Clone this wiki locally