preseqR.rSAC does not give the same result as either the ds or ztnb algorithms #25

johan-gson · 2020-03-19T14:13:47Z

As I understand it, this function should pick one of ds or ztnb, whichever fits best. The problem is that if it selects ztnb, it doesn't return the same as ztnb. There is an obvious bug in the code that gives this effect, see below. Perhaps this is meant to be that way?

The functions differ because of a suspected bad variable assignment:

preseqR.rSAC <- function(n, r=1, mt=20, size=SIZE.INIT, mu=MU.INIT)
{
para <- preseqR.ztnb.em(n) ##this call is also different, all params are not sent in
shape <- para$size ##### HERE IT STARTS, THIS IS NOT ASSIGNED TO size
mu <- para$mu

the population is heterogeneous

because the coefficient of variation is large $1 / sqrt(shape)$

if (shape <= 1) {
f.rSAC <- ds.rSAC(n=n, r=r, mt=mt)
} else {
## the population is close to be homogeneous
## the ZTNB approach is applied

## the probability of a species observed in the initial sample
p <- 1 - dnbinom(0, size = size, mu = mu) ########HERE size is used
## L is the estimated number of species in total
L <- sum(as.numeric(n[, 2])) / p
## ZTNB estimator
f.rSAC <- function(t) {
  L * pnbinom(r - 1, size=size, mu=mu*t, lower.tail=FALSE) ########HERE size is used
}

}
return(f.rSAC)
}

The text was updated successfully, but these errors were encountered:

andrewdavidsmith · 2020-03-19T19:23:26Z

@johan-gson I follow your logic. Did you try making the change?

johan-gson · 2020-03-19T21:33:03Z

I tested to create my own function by copying the original one and change it. Disappointingly, it performs worse, even worse than ztnb or ds on their own for my application, so it may be that selection on CV is not ideal in my case. It still looks like a bug compared to what is written in the paper, but I'm not sure what is best to do. I'm using preseq as part of an interesting method for single-cell RNA-Seq; I would like to discuss this with you over Skype if you have the time, I think the prediction could potentially be optimized for this purpose?

andrewdavidsmith · 2020-03-20T01:48:10Z

Please contact me directly rather than through GitHub.

andrewdavidsmith · 2020-03-20T02:27:35Z

@johan-gson I need a test case that will allow me to see if the size vs. shape are being used properly. Depending on the parameterization, this might easily be correct despite the apparent disagreement in names of variables. The "size" of the ZTNB seems to be the "shape" for the corresponding mixing gamma distribution. Everything points to this parameterization being assumed: the comment about the c_v, the parameterization of dnbinom in the em (at the top of the function), and the "shape" is only used for assessing the c_v. Bugs are easy when different people assume different parameterizations for these distributions, so I'll try to check it.

johan-gson · 2020-03-20T08:13:25Z

I can put together a small example for you with some code and data, hopefully later today. I also sent you an email about my other questions.

johan-gson · 2020-03-20T11:12:02Z

I sent you a test case on email

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preseqR.rSAC does not give the same result as either the ds or ztnb algorithms #25

preseqR.rSAC does not give the same result as either the ds or ztnb algorithms #25

johan-gson commented Mar 19, 2020

andrewdavidsmith commented Mar 19, 2020

johan-gson commented Mar 19, 2020

andrewdavidsmith commented Mar 20, 2020

andrewdavidsmith commented Mar 20, 2020

johan-gson commented Mar 20, 2020

johan-gson commented Mar 20, 2020

preseqR.rSAC does not give the same result as either the ds or ztnb algorithms #25

preseqR.rSAC does not give the same result as either the ds or ztnb algorithms #25

Comments

johan-gson commented Mar 19, 2020

the population is heterogeneous

because the coefficient of variation is large $1 / sqrt(shape)$

andrewdavidsmith commented Mar 19, 2020

johan-gson commented Mar 19, 2020

andrewdavidsmith commented Mar 20, 2020

andrewdavidsmith commented Mar 20, 2020

johan-gson commented Mar 20, 2020

johan-gson commented Mar 20, 2020