-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When to use a Jacobian transformation #5
Comments
I love a good change of variables discussion. :) I had a bunch of back and forth about this with Anders Nielsen during the RTMB course (prompted by his tmbstan example not having a Jacobian adjustment). Here is some demonstration code I worked up followed by a version where Anders added a Metropolis–Hastings example at the end, which shows this happens with non-Hamiltonian MCMC algorithms too. This approach can also be helpful if you want to make sure you've got your Jacobian adjustment entered correctly. For one, an easy mistake is to mess up the addition/subtraction. This code illustrates that without the Jacobian adjustment the prior is not what it appears to be. It's a prior, but it's not the desired prior. The example is a prior predictive distribution---no data---for simplicity. Also see: library(ggplot2)
library(RTMB)
library(tmbstan) set.seed(1)
dat <- list(x = NULL) # not used
par <- list(logsigma = 0)
ITER <- 3000
ctrl <- list(adapt_delta = 0.98, max_treedepth = 12)
# Skipping Jacobian adjustment
nll <- function(par) {
getAll(par, dat)
sigma <- exp(logsigma)
-sum(dnorm(sigma, 0, 1, TRUE))
}
obj <- MakeADFun(nll, par, silent = TRUE)
m <- tmbstan(obj, chains = 1, iter = ITER, control = ctrl)
e <- extract(m) # With Jacobian adjustment:
nll2 <- function(par) {
getAll(par, dat)
sigma <- exp(logsigma)
nll <- 0
# https://mc-stan.org/docs/stan-users-guide/changes-of-variables.html
# "log absolute determinant of the Jacobian of the transform"
# log absolute derivative of the transform if univariate
# for `exp(x)` is `log(abs(D(exp(x))))`, i.e., `x`
nll <- nll - logsigma
nll - sum(dnorm(sigma, 0, 1, TRUE))
}
obj2 <- MakeADFun(nll2, par, silent = TRUE)
m2 <- tmbstan(obj2, chains = 1, iter = ITER, control = ctrl)
e2 <- extract(m2) # Standard version in Stan where parameter enters model and is not transformed:
nll3 <- function(par) {
getAll(par, dat)
-sum(dnorm(sigma, 0, 1, TRUE))
}
par <- list(sigma = 0)
obj3 <- MakeADFun(nll3, par, silent = TRUE)
m3 <- tmbstan(obj3,
chains = 1, iter = ITER,
lower = c(0), upper = c(Inf), control = ctrl
)
e3 <- extract(m3) # half-normal(0, 1) distribution for reference:
true_prior <- rnorm(1e6, 0, 1)
true_prior <- true_prior[true_prior>0]
g <- dplyr::bind_rows(
data.frame(sd = exp(e$logsigma), type = "No Jacobian adjustment"),
data.frame(sd = exp(e2$logsigma), type = "With Jacobian adjustment"),
data.frame(sd = e3$sigma, type = "Untransformed with lower bound"),
data.frame(sd = true_prior, type = "Desired 'true' half-normal(0, 1) prior"),
) |>
ggplot(aes(sd, colour = type)) +
geom_density() +
coord_cartesian(xlim = c(0, 4)) +
scale_colour_brewer(palette = "Dark2") +
theme_light() +
ggtitle("Prior predictive distribution")
print(g) Code from Anders: library(RTMB)
library(tmbstan) set.seed(1)
dat <- list(x = rnorm(8, 0, 2))
par <- list(mu = 0, logsigma = 0)
ITER <- 2e4
# Skipping Jacobian adjustment
nll <- function(par) {
getAll(par, dat)
sigma <- exp(logsigma)
nll <- 0
nll - sum(dnorm(x, mu, sigma, TRUE))
}
obj <- RTMB:::MakeADFun(nll, par, silent = TRUE)
m <- tmbstan(obj, chains = 6, iter = ITER)
e <- extract(m) # With Jacobian adjustment:
nll2 <- function(par) {
getAll(par, dat)
sigma <- exp(logsigma)
nll <- 0
# https://mc-stan.org/docs/stan-users-guide/changes-of-variables.html
# "absolute determinant of the Jacobian of the transform"
# i.e., absolute derivative of the transform if univariate
# for `exp(logsigma)` is `logsigma`
nll <- nll - logsigma
nll - sum(dnorm(x, mu, sigma, TRUE))
}
obj2 <- RTMB::MakeADFun(nll2, par, silent = TRUE)
m2 <- tmbstan(obj2, chains = 6, iter = ITER)
e2 <- extract(m2) # Standard version in Stan where parameter enters model and is not transformed:
nll3 <- function(par) {
getAll(par, dat)
nll <- 0
nll - sum(dnorm(x, mu, sigma, TRUE))
}
par <- list(mu = 0, sigma = 0)
obj3 <- RTMB::MakeADFun(nll3, par, silent = TRUE)
m3 <- tmbstan(obj3,
chains = 6, iter = ITER,
lower = c(-Inf, 0), upper = c(Inf, Inf)
) e3 <- extract(m3)
## primitive mcmc like old ADMB
opt <- nlminb(obj$par, obj$fn, obj$gr)
rep <- sdreport(obj)
sdr <- summary(rep)
est <- opt$par
npar <- length(est)
cov <- rep$cov.fixed # or just # diag(npar) #
nosim <- 50000
mcmc <- matrix(NA, nrow = nosim, ncol = npar)
mcmc[1, ] <- est
colnames(mcmc) <- names(est)
steps <- MASS::mvrnorm(nosim, mu = rep(0, npar), Sigma = cov)
U <- runif(nosim)
print(system.time({
currentValue <- obj$fn(mcmc[1, ])
for (i in 2:nosim) {
proposal <- mcmc[i - 1, ] + steps[i, ]
proposalValue <- obj$fn(proposal)
if (U[i] < exp(currentValue - proposalValue)) {
mcmc[i, ] <- proposal
currentValue <- proposalValue
} else {
mcmc[i, ] <- mcmc[i - 1, ]
}
}
}))
#> user system elapsed
#> 0.368 0.002 0.374 plot(density(e$logsigma), main = "Densities")
lines(density(mcmc[, 2]), col = "red")
lines(density(e2$logsigma), col = "green")
lines(density(log(e3$sigma)), col = "blue")
abline(v = log(sd(dat$x)), lwd = 3)
legend("topright",
col = c("black", "red", "green", "blue"),
legend = c(
"no adjust", "no adjust & no stan", "adjust",
"flat prior on sigma (not log Sigma)"
), bty = "n", lwd = 1
)
text(log(sd(dat$x)), 0, " log(sd(dat$x))", adj = 0) |
Thank you guys for looking into this and sharing your thoughts (and providing code)! I had been wondering about this Jacobian business as well since Coles paper. So, for practical consideration put into simple terms for luddites such as myself, we should: In the simple case shown for estimating a parameter on the log scale, subtract that log scale parameter from the negative log likelihood? |
Hey Nick! I think that the key is that we want the declared parameter (in the par list) to be in the same units as the thing that goes into the sampling statement. If not, then you need the Jacobian adjustment, if you are doing some sort of MS sampling. Maybe all of that was already implicit in your comment, but just adding for completeness. Catarina |
@catarinawor and I have been in discussion with Cole M at NOAA about when to apply the Jacobian correction after reading about it in his new publication.
I've done a bit of a deeper dive on this than I meant for my own reasons but here is the when/why.
If you're Bayesian and you do a transformation, then you need a Jacobian for the transformation of variables. This is because when you're Bayesian, your parameter is a random variable.$\sigma \sim \gamma(1,1)$ , but we do a random walk proposal on the log scale (MCMC sampling and otherwise we need additional techniques like reflection to not propose negative values). In this case, we need to adjust the acceptance ratio for the transformation to respect the original prior distribution.$\sigma$ . In this case, we will do optimization on $log(\sigma)$ , and thus need to adjust the posterior distribution by the Jacobian as well.
a.
b. we want to find the posterior mode for
If you're a Frequentist doing maximum likelihood estimation. You usually don't need a Jacobian. That is because parameters are fixed values and not random variables. You can transform them as you please and not worry. When is this not true?
a. When you have random effects that are not normally distribution (e.g. gamma distributed). We need to do Laplace marginalization on the log scale. As a result, we need to transform the likelihood for the distribution of the random effect on the log scale.
b. See example R code in RTMB https://github.com/pbs-assess/renewassess/blob/main/code/RTMB/TestingJacobianLaplace.r
For an example here is Nimble code for the log likelihood used for Laplace. Because they have a built in parameter transformation system you don't have to worry about it like in RMTB. See how they define the log likelihood as
model$calculate(innerCalcNodes)
and then the additionalreTrans$logDetJacobian(reTransform)
term. In this case for normally distributed random effects,reTrans$logDetJacobian(reTransform) = 0
.https://github.com/nimble-dev/nimble/blob/devel/packages/nimble/R/Laplace.R
Let's have a conversation if people have more to say about this or strong opinions! :-) or want to talk more about Jacobians.
The text was updated successfully, but these errors were encountered: