Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference without modeling dropout? #9

Open
mochar opened this issue May 25, 2019 · 2 comments
Open

Inference without modeling dropout? #9

mochar opened this issue May 25, 2019 · 2 comments

Comments

@mochar
Copy link

mochar commented May 25, 2019

Hi Kieran, thanks for the cool package. I am interested in learning more about Bayesian stuff so your other work seem interesting as well!

Recently there has been talk about how UMI count data in scRNA-seq is not zero-inflated. Instead it is recommended to model UMI counts using a negative binomial (or even a Poission) distribution. (I can share some papers if you'd like)

For this reason I was wondering if there was a way to omit the explicit modeling of zero counts. Also your thoughts on using the aforementioned distributions to directly use the gene counts instead of the log-transformed CPM data.

Thanks!

@kieranrcampbell
Copy link
Owner

Hi @mochar

This is an excellent question. Ouija dates from the dark days of single cell analysis when I / we would log data and model the log counts with gaussians, rather than modelling directly the raw counts with e.g. negative binomials.

From a modelling perspective, if you log the data and use gaussian you probably do want to model a zero inflated component, since the gaussian is mis-specified and has infinitesimal mass at zero, compared to the negative binomial that actually does have probability mass there. However, if we weren't to log the data and used a negative binomial then as Valentine Svensson and others have recently pointed out, you probably wouldn't want to include inflation at zero.

In terms of results, I suspect for Ouija it would make little difference, but if it's something you would find useful we could create a negative binomial variant. I think the modification would be fairly trivial (remove the zero inflation, change the likelihood to NB with the mean exponentiated). The mean-variance parametrization might be a little tricky however.

Thanks

Kieran

@mochar
Copy link
Author

mochar commented Jun 26, 2019

Thank you for the quick and helpful response @kieranrcampbell (and apologies for my slow response!).

I share your suspicion that in the end it would make little difference. I applied slalom on logcounts using a Hurdle noise model, and on scTransform corrected counts with a Poisson model. Both options lead to the same set of factors with similar cell loadings. I also used the Gaussian noise model on scTransform's pearson residuals (which are not really normally distributed) and the results are again mostly the same.

Nonetheless for the sake of using the same noise model when applying both tools I would actually appreciate a NB and/or Poisson implementation. If you have the time for this I would be glad to test it out and share the results.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants