How to deal "missing value error" on imputed genotypes data #511

panqinglzmc · 2024-07-30T08:57:06Z

I am using the demonstration data "penncath" from the tutorial of the R package [bigsnpr]. I have verified that there are no missing values in the imputed obj$geno_imputed data, both by SNP and samples. However, when I use the big_SVD function for principal component analysis, I still encounter an error: "You can't have missing values in 'X'", as shown below. I would be very grateful if someone could help me identify where I might have made a mistake.

privefl · 2024-07-30T09:19:53Z

using the demonstration data "penncath" from the tutorial of the R package [bigsnpr]

Not sure which data you're talking about??
Could you show snp_stats[, 1:5] please? (including the rownames)
Not related to this issue, but I would recommend using snp_autoSVD() instead of big_SVD() for genotype data.

panqinglzmc · 2024-07-30T16:45:33Z

Thanks Florian for the quick turnaround! And I must apologize for not explaining clearly. I am using the imputed genotype data from the GWAS tutorial: Imputation available at this link. The data was generated using the following code:

The snp_stats[, 1:5] output is as follows, identical to what is shown in the tutorial.

However, when passing this obj$geno_imputed data to the big_SVD function in the subsequent code in the GWAS tutorial: Population structure (available here), it results in the error mentioned above.

### But I am very happy to say that following your suggestion to use snp_autoSVD() instead of big_SVD() for genotype data, the problem has been solved, and the code now runs smoothly, as shown below.

Thank you so much, Florian. Your suggestion has been incredibly helpful.

privefl · 2024-07-30T20:12:41Z

This is good that you found some workaround.
But the initial issue is not really fixed.
I have some idea what's going on.
Could you please confirm your packageVersion("bigstatsr")?

panqinglzmc · 2024-07-31T01:28:04Z

My ‘bigstatsr’ version is 1.5.12.

Ps. My 'bigsnpr' version is 1.12.2. I loaded the bigsnpr package, and the bigstatsr package was automatically loaded along with it. Subsequently, I used the following two functions: snp_fastImputeSimple {bigsnpr} and snp_autoSVD {bigsnpr}.

privefl · 2024-07-31T07:57:40Z

What do you get if you run this reproducible code?

zip <- runonce::download_file(
  "https://d1ypx1ckp5bo16.cloudfront.net/penncath/penncath.zip",
  dir = "tmp-data")
unzip(zip, exdir = "tmp-data", overwrite = FALSE)

library(bigsnpr)
snp_readBed("tmp-data/data/penncath.bed")
penncath <- snp_attach("tmp-data/data/penncath.rds")
penncath$geno_imputed <- snp_fastImputeSimple(Gna = penncath$genotypes,
                                              method = "mode",
                                              ncores = nb_cores())

big_SVD(penncath$geno_imputed, big_scale(), k = 10)

For me, it runs forever because there are some variables with no variation that prevent convergence (which now errors with v1.5.14).
But I don't get the error about missing values (with both v1.5.12 and v1.5.15).

panqinglzmc · 2024-08-01T05:25:01Z

I ran the code, and come out the same error.

privefl · 2024-08-01T06:38:11Z

I cannot reproduce the issue, and I have no idea what's going on :/
Is this the only function where you have this issue? (e.g. if you also try running big_univLinReg(penncath$geno_imputed, rnorm(1401), ind.col = 1:100))

PS: You should try not to change the working directory; use RStudio projects and stick with the working directory of the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal "missing value error" on imputed genotypes data #511

How to deal "missing value error" on imputed genotypes data #511

panqinglzmc commented Jul 30, 2024

privefl commented Jul 30, 2024

panqinglzmc commented Jul 30, 2024

privefl commented Jul 30, 2024

panqinglzmc commented Jul 31, 2024 •

edited

Loading

privefl commented Jul 31, 2024 •

edited

Loading

panqinglzmc commented Aug 1, 2024 •

edited

Loading

privefl commented Aug 1, 2024 •

edited

Loading

How to deal "missing value error" on imputed genotypes data #511

How to deal "missing value error" on imputed genotypes data #511

Comments

panqinglzmc commented Jul 30, 2024

privefl commented Jul 30, 2024

panqinglzmc commented Jul 30, 2024

privefl commented Jul 30, 2024

panqinglzmc commented Jul 31, 2024 • edited Loading

privefl commented Jul 31, 2024 • edited Loading

panqinglzmc commented Aug 1, 2024 • edited Loading

privefl commented Aug 1, 2024 • edited Loading

panqinglzmc commented Jul 31, 2024 •

edited

Loading

privefl commented Jul 31, 2024 •

edited

Loading

panqinglzmc commented Aug 1, 2024 •

edited

Loading

privefl commented Aug 1, 2024 •

edited

Loading