Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension from BP3 to BP4 #1

Open
mfilip8 opened this issue Feb 22, 2019 · 2 comments
Open

Extension from BP3 to BP4 #1

mfilip8 opened this issue Feb 22, 2019 · 2 comments

Comments

@mfilip8
Copy link

mfilip8 commented Feb 22, 2019

Hi All,

first of all great work, great paper where I really appreciated all the effort you put in confidence intervals and sensitivity testing using parameter variation with simulated data.
I wanted to ask you if it would be easy to extend in the txburstML.py the Beta-Poisson from being a 3 param function to a 4 parameter function as in https://academic.oup.com/bioinformatics/article/32/14/2128/2288270#95426676
and in case if you can help me in doing so.
My question arises from the fact that while trying out your code on our single cell RNA seq data I faced the following issue: since our cells don't get a very similar number of reads per cell I wanted to normalise our UMI counts by total number of reads (or UMI) per cell.
What I exactly did was to normalise its
UMI count gene_x cell_i / (sum UMIs cell_i/ median UMIs all cells)
So that my actual normalisation factor ranges from roughly from 0.6 to 1.4.
In doing so I nevertheless noticed that the fit becomes very dependent from the particular normalisation factor (for example adding a *10, *100), giving me results that are more similar to what you have in our paper using an additional *10.
BP4 would fit this normalisation factor and allow for a better fit in general.
Sorry for all the writing but I wanted to know your take on the this and in particular to the UMI count normalisation.

Thanks a lot in advance
Marco

@AntonJMLarsson
Copy link
Collaborator

AntonJMLarsson commented Feb 25, 2019

I don't think it would be hard to include a scaling factor for the ML estimation, but it's not clear to me whether that you help you. Is it correct that your normalization factor is different for each cell? The extension in that paper is a normalization per gene.

Which protocol did you use to obtain your data? How many reads do you have per cell on average?

Best,

Anton

@mfilip8
Copy link
Author

mfilip8 commented Feb 25, 2019

Is it correct that your normalization factor is different for each cell?

Yes, the logic is that if cell_i and cell_j have respectively 5 and 10 counts for gene_x, but also a total number of reads of 50000 and 100000 you cannot directly compare the counts.

The extension in that paper is a normalization per gene.

Correct, this factor sort of account for the range of the expression of that gene.

Which protocol did you use to obtain your data? How many reads do you have per cell on average?

We have 10x and the drop out is not negligible...the number of UMIs per cell ranges from 8000 to 16000

If I apply your scripts in the same way you did...among 14000 expressed genes I can get the kinetic parameters predicted for less than 700 genes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants