I'm working on a Stan program that includes a generated quantities block that generates log_lik for use with loo and yrep for posterior predictive checks. This looks something like (with some parts abbreviated as ...):
generated quantities {
vector[n] log_lik;
array[n] int yrep;
...
for (i in 1:n) {
...
log_lik[i] = neg_binomial_2_lpmf(y[i] | mu[i], phi[i]);
}
yrep = neg_binomial_2_rng(mu, phi);
}
I'm using cmdstanr to fit the model and run loo with moment-matching:
mod <- cmdstanr::cmdstan_model(...)
fit <- mod$sample(...)
fit_loo <- fit$loo(moment_match = TRUE, cores = 4)
During the operation of loo_moment_match(), I sometimes get a couple error/exception messages that appear to stem from overflow in the *_rng function in the generated quantities block. These all look like:
Error : Exception: neg_binomial_2_rng: Random number that came from gamma distribution is 1.47285e+09, but must be less than 1073741824.000000 (in '/var/folders/2k/c0vy7xwj4kb9x7hbgtpq5m640000gn/T//RtmpE8xV8L/model-6c3130f99d6e.stan', line 83, column 4 to column 39)
Further, these messages are sometimes (but not usually) followed by an error that causes loo_moment_match() to fail:
Error in mm_list[[ii]]$i : $ operator is invalid for atomic vectors
In addition: Warning message:
In parallel::mclapply(X = I, mc.cores = cores, FUN = function(i) loo_moment_match_i_fun(i)) :
scheduled cores 4, 1, 3 encountered errors in user code, all values of the jobs will be affected
To the best of my understanding, this appears to happen because loo_moment_match_i_fun() is failing for one or more cases. Perhaps mm_list[[ii]] is NA?
|
mm_list <- parallel::mclapply(X = I, mc.cores = cores, |
|
FUN = function(i) loo_moment_match_i_fun(i)) |
|
for (ii in seq_along(I)) { |
|
i <- mm_list[[ii]]$i |
I get a small number (~1-3) of the error/exception messages pretty consistently, but the error that causes loo_moment_match() to fail is less common. One place that I've been able to produce this error consistently is within a targets pipeline, which suggests to me that this is something that can be influenced by the RNG state. When I did get this error, it was preceded by ~10 of those error/exception messages. I can confirm that this error can also be produced without targets or callr, just less consistently. I'm using cores = 4 here, but the error can still occur with cores = 1. Commenting out code for yrep and *_rng in the Stan file eliminates the issue entirely, but it is (very so slightly) inconvenient to have to make this change depending on whether I want to use loo_moment_match() with the fitted model. I haven't encountered this problem when the *_rng function is something that is less likely to overflow than the negative binomial.
I wanted to report this issue here since it seems to have something to do with loo_moment_match(). It feels like it could be something related to or not entirely covered by #262. If this is expected behavior, I would appreciate any tips on how to better deal with having both log_lik and yrep in the generated quantities block when it comes to using loo_moment_match(). I'm sorry if any of this is off base, as I do not have a good understanding of the inner workings of the moment-matching code.
Some system info:
> packageVersion("loo")
[1] ‘2.8.0.9000’
> packageVersion("cmdstanr")
[1] ‘0.8.1’
> cmdstanr::cmdstan_version()
[1] "2.35.0"
> R.version
_
platform aarch64-apple-darwin20
arch aarch64
os darwin20
system aarch64, darwin20
status
major 4
minor 4.1
year 2024
month 06
day 14
svn rev 86737
language R
version.string R version 4.4.1 (2024-06-14)
nickname Race for Your Life
I'm working on a Stan program that includes a
generated quantitiesblock that generateslog_likfor use withlooandyrepfor posterior predictive checks. This looks something like (with some parts abbreviated as...):I'm using
cmdstanrto fit the model and runloowith moment-matching:During the operation of
loo_moment_match(), I sometimes get a couple error/exception messages that appear to stem from overflow in the*_rngfunction in thegenerated quantitiesblock. These all look like:Further, these messages are sometimes (but not usually) followed by an error that causes
loo_moment_match()to fail:To the best of my understanding, this appears to happen because
loo_moment_match_i_fun()is failing for one or more cases. Perhapsmm_list[[ii]]isNA?loo/R/loo_moment_matching.R
Lines 130 to 131 in 6e7001e
loo/R/loo_moment_matching.R
Lines 142 to 143 in 6e7001e
I get a small number (~1-3) of the error/exception messages pretty consistently, but the error that causes
loo_moment_match()to fail is less common. One place that I've been able to produce this error consistently is within atargetspipeline, which suggests to me that this is something that can be influenced by the RNG state. When I did get this error, it was preceded by ~10 of those error/exception messages. I can confirm that this error can also be produced withouttargetsorcallr, just less consistently. I'm usingcores = 4here, but the error can still occur withcores = 1. Commenting out code foryrepand*_rngin the Stan file eliminates the issue entirely, but it is (very so slightly) inconvenient to have to make this change depending on whether I want to useloo_moment_match()with the fitted model. I haven't encountered this problem when the*_rngfunction is something that is less likely to overflow than the negative binomial.I wanted to report this issue here since it seems to have something to do with
loo_moment_match(). It feels like it could be something related to or not entirely covered by #262. If this is expected behavior, I would appreciate any tips on how to better deal with having bothlog_likandyrepin thegenerated quantitiesblock when it comes to usingloo_moment_match(). I'm sorry if any of this is off base, as I do not have a good understanding of the inner workings of the moment-matching code.Some system info: