Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LD Pruning of SNPs #66

Open
kose-y opened this issue Jul 30, 2020 · 4 comments
Open

LD Pruning of SNPs #66

kose-y opened this issue Jul 30, 2020 · 4 comments

Comments

@kose-y
Copy link
Member

kose-y commented Jul 30, 2020

@Hua-Zhou at #58

We may open another issue regarding LD pruning of SNPs so that remaining
SNPs are not very highly related. A literature review of how Plink does it
is helpful.

@kose-y
Copy link
Member Author

kose-y commented Jul 30, 2020

The source code is here. https://github.com/chrchang/plink-ng/blob/master/2.0/plink2_ld.cc ... but can we find a text description?

@kose-y
Copy link
Member Author

kose-y commented Jul 30, 2020

@kose-y
Copy link
Member Author

kose-y commented Jul 30, 2020

https://zzz.bwh.harvard.edu/plink/summary.shtml

Linkage disequilibrium based SNP pruning
Sometimes it is useful to generate a pruned subset of SNPs that are in approximate linkage equilibrium with each other. This can be achieved via two commands: --indep which prunes based on the variance inflation factor (VIF), which recursively removes SNPs within a sliding window; second, --indep-pairwise which is similar, except it is based only on pairwise genotypic correlation.
Hint The output of either of these commands is two lists of SNPs: those that are pruned out and those that are not. A separate command using the --extract or --exclude option is necessary to actually perform the pruning.
The VIF pruning routine is performed:
plink --file data --indep 50 5 2
will create files
plink.prune.in
plink.prune.out
Each is a simlpe list of SNP IDs; both these files can subsequently be specified as the argument for a --extract or --exclude command.
The parameters for --indep are: window size in SNPs (e.g. 50), the number of SNPs to shift the window at each step (e.g. 5), the VIF threshold. The VIF is 1/(1-R^2) where R^2 is the multiple correlation coefficient for a SNP being regressed on all other SNPs simultaneously. That is, this considers the correlations between SNPs but also between linear combinations of SNPs. A VIF of 10 is often taken to represent near collinearity problems in standard multiple regression analyses (i.e. implies R^2 of 0.9). A VIF of 1 would imply that the SNP is completely independent of all other SNPs. Practically, values between 1.5 and 2 should probably be used; particularly in small samples, if this threshold is too low and/or the window size is too large, too many SNPs may be removed.
The second procedure is performed:
plink --file data --indep-pairwise 50 5 0.5
This generates the same output files as the first version; the only difference is that a simple pairwise threshold is used. The first two parameters (50 and 5) are the same as above (window size and step); the third parameter represents the r^2 threshold. Note: this represents the pairwise SNP-SNP metric now, not the multiple correlation coefficient; also note, this is based on the genotypic correlation, i.e. it does not involve phasing.
To give a concrete example: the command above that specifies 50 5 0.5 would a) consider a window of 50 SNPs, b) calculate LD between each pair of SNPs in the window, b) remove one of a pair of SNPs if the LD is greater than 0.5, c) shift the window 5 SNPs forward and repeat the procedure.
To make a new, pruned file, then use something like (in this example, we also convert the standard PED fileset to a binary one):
plink --file data --extract plink.prune.in --make-bed --out pruneddata

@kose-y
Copy link
Member Author

kose-y commented Aug 9, 2020

Source code for PLINK 1.07 is much more readable:

https://github.com/poulson/plink/blob/master/genome.cpp#L1172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants