Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal "missing value error" on imputed genotypes data #511

Open
panqinglzmc opened this issue Jul 30, 2024 · 7 comments
Open

How to deal "missing value error" on imputed genotypes data #511

panqinglzmc opened this issue Jul 30, 2024 · 7 comments

Comments

@panqinglzmc
Copy link

I am using the demonstration data "penncath" from the tutorial of the R package [bigsnpr]. I have verified that there are no missing values in the imputed obj$geno_imputed data, both by SNP and samples. However, when I use the big_SVD function for principal component analysis, I still encounter an error: "You can't have missing values in 'X'", as shown below. I would be very grateful if someone could help me identify where I might have made a mistake.

image

@privefl
Copy link
Owner

privefl commented Jul 30, 2024

  • using the demonstration data "penncath" from the tutorial of the R package [bigsnpr]

    Not sure which data you're talking about??

  • Could you show snp_stats[, 1:5] please? (including the rownames)

  • Not related to this issue, but I would recommend using snp_autoSVD() instead of big_SVD() for genotype data.

@panqinglzmc
Copy link
Author

Thanks Florian for the quick turnaround! And I must apologize for not explaining clearly. I am using the imputed genotype data from the GWAS tutorial: Imputation available at this link. The data was generated using the following code:

image

The snp_stats[, 1:5] output is as follows, identical to what is shown in the tutorial.

image

However, when passing this obj$geno_imputed data to the big_SVD function in the subsequent code in the GWAS tutorial: Population structure (available here), it results in the error mentioned above.

### But I am very happy to say that following your suggestion to use snp_autoSVD() instead of big_SVD() for genotype data, the problem has been solved, and the code now runs smoothly, as shown below.

image

Thank you so much, Florian. Your suggestion has been incredibly helpful.

@privefl
Copy link
Owner

privefl commented Jul 30, 2024

This is good that you found some workaround.
But the initial issue is not really fixed.
I have some idea what's going on.
Could you please confirm your packageVersion("bigstatsr")?

@panqinglzmc
Copy link
Author

panqinglzmc commented Jul 31, 2024

My ‘bigstatsr’ version is 1.5.12.

Ps. My 'bigsnpr' version is 1.12.2. I loaded the bigsnpr package, and the bigstatsr package was automatically loaded along with it. Subsequently, I used the following two functions: snp_fastImputeSimple {bigsnpr} and snp_autoSVD {bigsnpr}.

@privefl
Copy link
Owner

privefl commented Jul 31, 2024

What do you get if you run this reproducible code?

zip <- runonce::download_file(
  "https://d1ypx1ckp5bo16.cloudfront.net/penncath/penncath.zip",
  dir = "tmp-data")
unzip(zip, exdir = "tmp-data", overwrite = FALSE)

library(bigsnpr)
snp_readBed("tmp-data/data/penncath.bed")
penncath <- snp_attach("tmp-data/data/penncath.rds")
penncath$geno_imputed <- snp_fastImputeSimple(Gna = penncath$genotypes,
                                              method = "mode",
                                              ncores = nb_cores())

big_SVD(penncath$geno_imputed, big_scale(), k = 10)

For me, it runs forever because there are some variables with no variation that prevent convergence (which now errors with v1.5.14).
But I don't get the error about missing values (with both v1.5.12 and v1.5.15).

@panqinglzmc
Copy link
Author

panqinglzmc commented Aug 1, 2024

I ran the code, and come out the same error.

image

@privefl
Copy link
Owner

privefl commented Aug 1, 2024

I cannot reproduce the issue, and I have no idea what's going on :/
Is this the only function where you have this issue? (e.g. if you also try running big_univLinReg(penncath$geno_imputed, rnorm(1401), ind.col = 1:100))

PS: You should try not to change the working directory; use RStudio projects and stick with the working directory of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants