-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling missing values in SNP matrix #49
Comments
Yes, most of the functions in the packages don't handle missing values. If you want to use {bigsnpr}, you should therefore filter to no missing values or impute them. |
I've just added |
Is this answering your questions? |
Thanks for clarifying. Is imputation mathematically equivalent to excluding genotypes missing a given SNP when building the model for that SNP? Should it produce the same test statistics? I think most of the other GWAS methods we use (particularly GEMMA) exclude those genotypes instead of imputing. |
No, I don't think that this is equivalent. |
Hi Florian, |
Yes, I guess it would be easy. |
I just think this is more conservative. |
Should be now implemented in latest version, using |
Thanks so much Florian!
…On Wed, Jan 29, 2020 at 9:37 PM Florian Privé ***@***.***> wrote:
Should be now implemented in latest version, using method = "zero".
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AJWVPHFY5YGITK7ZMGB2C43RAFL5FA5CNFSM4ISQ4MYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKGXHEI#issuecomment-579695505>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJWVPHCDA3Z5GR52Z6LI673RAFL5FANCNFSM4ISQ4MYA>
.
|
@wongck-kevin I just remembered why I didn't implement this before. In fact, you can do |
Thanks for the update Florian. |
Note that I will soon deprecate 'method = "zero"' in favor of using "$copy()". |
Thanks for making this tool available. I wasn't sure whether to mark this as a question or issue because it's unclear to me if there is existing support for PCA and GWAS with SNP matrices that contain NAs. I am accustomed to setting thresholds in PLINK and GEMMA for which all SNPs missing in more than X genotypes are excluded, but haven't found a similar feature for bigsnpr in documentation or the demo.
I'm trying to run my data, which I've formatted the same way as the demo data, but there are NAs in my SNP set (from QC filtering, indels, etc) and this leads me to the error below.
Do I need to format the NAs a particular way, or do anything else to build models that include SNPs not found in all genotypes?
The text was updated successfully, but these errors were encountered: