-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preseqR.rSAC does not give the same result as either the ds or ztnb algorithms #25
Comments
@johan-gson I follow your logic. Did you try making the change? |
I tested to create my own function by copying the original one and change it. Disappointingly, it performs worse, even worse than ztnb or ds on their own for my application, so it may be that selection on CV is not ideal in my case. It still looks like a bug compared to what is written in the paper, but I'm not sure what is best to do. I'm using preseq as part of an interesting method for single-cell RNA-Seq; I would like to discuss this with you over Skype if you have the time, I think the prediction could potentially be optimized for this purpose? |
Please contact me directly rather than through GitHub. |
@johan-gson I need a test case that will allow me to see if the size vs. shape are being used properly. Depending on the parameterization, this might easily be correct despite the apparent disagreement in names of variables. The "size" of the ZTNB seems to be the "shape" for the corresponding mixing gamma distribution. Everything points to this parameterization being assumed: the comment about the c_v, the parameterization of dnbinom in the em (at the top of the function), and the "shape" is only used for assessing the c_v. Bugs are easy when different people assume different parameterizations for these distributions, so I'll try to check it. |
I can put together a small example for you with some code and data, hopefully later today. I also sent you an email about my other questions. |
I sent you a test case on email |
As I understand it, this function should pick one of ds or ztnb, whichever fits best. The problem is that if it selects ztnb, it doesn't return the same as ztnb. There is an obvious bug in the code that gives this effect, see below. Perhaps this is meant to be that way?
The functions differ because of a suspected bad variable assignment:
preseqR.rSAC <- function(n, r=1, mt=20, size=SIZE.INIT, mu=MU.INIT)
{
para <- preseqR.ztnb.em(n) ##this call is also different, all params are not sent in
shape <- para$size ##### HERE IT STARTS, THIS IS NOT ASSIGNED TO size
mu <- para$mu
the population is heterogeneous
because the coefficient of variation is large$1 / sqrt(shape)$
if (shape <= 1) {
f.rSAC <- ds.rSAC(n=n, r=r, mt=mt)
} else {
## the population is close to be homogeneous
## the ZTNB approach is applied
}
return(f.rSAC)
}
The text was updated successfully, but these errors were encountered: