Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return SNP info along with LD matrix #10

Open
xiangzhu opened this issue Aug 3, 2018 · 2 comments
Open

Return SNP info along with LD matrix #10

xiangzhu opened this issue Aug 3, 2018 · 2 comments

Comments

@xiangzhu
Copy link
Collaborator

xiangzhu commented Aug 3, 2018

It seems the main function only returns an estimated LD matrix at this point?
https://github.com/stephenslab/LDshrink/blob/32b4ad3942f7cb429f23c529b86ab72cfbb1b257/R/LDshrink.R#L6

Ideally we want to have some basic SNP info available (e.g. position, allele), which is essential in combining LD with GWAS summary statistics in analyses.

I think the emeraLD package gives us a good example: https://github.com/statgen/emeraLD

> source('emeraLD2R.r');
Loading required package: data.table
data.table 1.11.4  Latest news: http://r-datatable.com
emeraLD v0.1 (c) 2018 corbin quick (corbinq@gmail.com)

reading from m3vcf file...

processed genotype data for 5008 haplotypes...

calculating LD for 60 SNPs...

done!! thanks for using emeraLD

> names(ld_data)
[1] "Sigma" "info"

> head(ld_data$info)
   chr   pos          id ref alt
1:  20 83061 rs549711487   C   T
2:  20 83196  rs62190472   A   T
3:  20 83252   rs6137896   G   C
4:  20 83570   rs6048967   T   G
5:  20 83611 rs114000219   C   A
6:  20 83792 rs529518485   A   G

> head(ld_data$Sigma[, 1:5], 5)
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,]  1.00000 -0.00602  0.03989 -0.00824 -0.00331
[2,] -0.00602  1.00000 -0.14013 -0.03102 -0.01245
[3,]  0.03989 -0.14013  1.00000 -0.05714 -0.04400
[4,] -0.00824 -0.03102 -0.05714  1.00000 -0.01704
[5,] -0.00331 -0.01245 -0.04400 -0.01704  1.00000
@CreRecombinase
Copy link
Collaborator

CreRecombinase commented Aug 13, 2018

I agree that returning some kind of info about SNPs would be useful. I don't think it's useful or necessary to enforce that though. One thing that would be quick and easy would be to add colnames and rownames to the LD matrix that match the colnames of the input SNP matrix, that way the user has the option of getting back SNP information, but doesn't need to make up fake SNP information if they don't have any (which comes up pretty often)

@xiangzhu
Copy link
Collaborator Author

@CreRecombinase yes, colnames and rownames seem to be sufficient in most cases.

Totally agree the following:

that way the user has the option of getting back SNP information, but doesn't need to make up fake SNP information if they don't have any (which comes up pretty often)

LDshrink doesn't have to give a snp_info when users don't have any.

There is one use case that having snp_info seems necessary. Suppose one analyst needs to analyze GWAS summary data of two traits together with LD estimates. For many SNPs, the ALT and REF alleles are different between the two traits. To properly flip the sign of betahat and/or LD estimates, we need the ALT and REF info.

However, this won't be necessary if the analyst has already unified the ALT and REF of all GWAS summary data files before using LDshrink.

Finally, I think emeraLD can easily pull out snp_info because it uses vcf as input, and vcf already contains snp_info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants