Skip to content

Latest commit

 

History

History
471 lines (358 loc) · 12.4 KB

03b-quality-of-estimation.qmd

File metadata and controls

471 lines (358 loc) · 12.4 KB
title subtitle author pdf-engine format
Nonlinear Modeling: Quality of parameter estimates
Introduction to Statistical Modelling
Prof. Joris Vankerschaver
lualatex
beamer
theme colortheme fonttheme header-includes
Pittsburgh
default
default
\setbeamertemplate{frametitle}[default][left] \setbeamertemplate{footline}[frame number] \usepackage{emoji} \usepackage{luatexko}
```{r, include=FALSE} library(ggplot2) library(gridExtra) theme_set(theme_bw() + theme(text = element_text(size = 14))) ``` ## Learning outcomes You should be able to - Understand the interpretation of measurement noise - Explain the role of the Fisher information matrix in quantifying parameter uncertainty - Compute a confidence interval for a parameter - Compute the correlation between two parameters ## Quality of estimation - Apart from obtaining parameter estimates, we want to know a measure of uncertainty for these values. - Main idea: use objective function $J(\theta)$ to quantify uncertainty. - High curvature: low uncertainty (parameters well determined) - Low curvature: high uncertainty (not well determined) ```{r, echo=FALSE} library(plot3D) x <- outer(seq(-7, 7, length.out = 50), rep(1, 50)) y <- outer(rep(1, 50), seq(-7, 7, length.out = 50)) z1 <- x^2 + y^2 z2 <- 0.5*x^2 + 0.1*y^2 par(mfrow=c(1, 2)) surf3D(x, y, z1, colkey = FALSE) surf3D(x, y, z2, colkey = FALSE) ``` ## Synthetic data Model: logistic curve $$ y = \frac{A}{1 + \exp(k(x_\text{mid} - x))} + \epsilon, $$ where $\epsilon \sim N(0, \sigma^2)$. - Parameters: $A = 5.6$, $k = 1.4$, $x_\text{mid} = 2.5$. - Measurement noise: $\sigma^2 = 0.2$ (the measure). We take $n = 20$ data points from this model: ```{r, echo=FALSE, out.height='33%', fig.align='center'} set.seed(1234) thetas <- c(5.6, 1.4, 2.5) measurement_sd <- 0.2 model <- function(xdata, thetas) { thetas[[1]] / (1 + exp(thetas[[2]] * (thetas[[3]] - xdata))) } n <- 20 xdata <- seq(-3, 9, length.out = n) + rnorm(n, sd = 0.5) ydata <- model(xdata, thetas) + rnorm(n, sd = measurement_sd) par(cex = 2) plot(xdata, ydata, pch = 19, xlab = "", ylab = "") ``` ## Model fit - From now on, we "forget" the true parameters, and we will work with the data only. - Nonlinear least squares: $A = 5.359$, $k = 1.597$, $x_\text{mid} = 2.500$. \vspace*{0.5cm} ```{r, echo=FALSE, out.height="50%", fig.align='center'} J <- function(thetas, xdata, ydata) { resids <- ydata - model(xdata, thetas) return(sum(resids^2)) } fit <- optim(c(5, 1, 2), J, method = "L-BFGS", xdata = xdata, ydata = ydata) thetas_optim <- fit$par xplot <- seq(min(xdata), max(xdata), length.out = 50) yplot_optim <- model(xplot, thetas_optim) par(cex = 1.5) plot(xdata, ydata, pch = 19, xlab = "X", ylab = "Y") lines(xplot, yplot_optim, lty = 1) ``` ## Measurement variance - Typically measurement variance is not known. - If model well-fitted: estimate from residuals: $$ \sigma^2 \approx \frac{J(\theta_\text{best})}{N - p}. $$ - Here $\sigma^2 \approx 0.523/17 = 0.031$ (true value $\sigma = 0.2^2 = 0.04$) \vspace*{0.5cm} ```{r, echo=FALSE, out.height='50%', fig.align='center'} par(mfrow=c(1, 2)) par(cex = 1.5) resids <- ydata - model(xdata, thetas_optim) plot(ydata, resids, pch = 19, xlab = "Y", ylab = "Residuals") abline(h = 0, lty = "dashed") qqnorm(resids) qqline(resids) ``` ## The loss surface - Surface obtained by plotting $J(\theta)$ for all $\theta$ in some range. - Optimal parameters are minima on this surface. - When more than 2 parameters: focus on subset of parameters. - **For visualization only.** (Higher dimensions: calculus) :::: {.columns} ::: {.column width="50%"} ![](./images/03a-parameter-estimation/loss-surface-3d.png) ::: ::: {.column width="50%"} ![](./images/03a-parameter-estimation/loss-surface-2d.png) ::: :::: ## Exact confidence region Confidence region: all $\theta$ such that $$ J(\theta) \le \left(1 + \frac{p}{N-p}F_{p,N-p,1-\alpha} \right) \times J(\theta_\text{best}), $$ where $F_{p,N-p,1-\alpha}$ is quantile from $F$-distribution, $\alpha$ is significance level. - Reasonably exact for models that are not too nonlinear. - Easy to calculate numerically - **Hard to describe or use explicitly** ## Exact confidence region ![](./images/03a-parameter-estimation/level-sets-ci.png){fig-align='center'} ## Approximate confidence region Taylor expansion to second order: \begin{multline*} J(\theta) \approx J(\theta_\text{best}) + \sum_{i = 1}^N \underbrace{\frac{\partial J}{\partial \theta_i}(\theta_\text{best})}_{= 0}(\theta - \theta_\text{best})_i + \\ \frac{1}{2} \sum_{i,j=1}^N \frac{\partial^2 J}{\partial \theta_i \partial \theta_j} (\theta - \theta_\text{best})_i(\theta - \theta_\text{best})_j. \end{multline*} Confidence region becomes $$ (\theta - \theta_\text{best})^T \mathcal{I} (\theta - \theta_\text{best}) \le p F_{p, N-p, 1-\alpha}. $$ with $\mathcal{I}$ the **Fisher Information Matrix (FIM)**: $$ \mathcal{I}_{ij} = \frac{1}{\sigma^2} \sum_{k = 1}^N \left( \frac{\partial y}{\partial \theta_i}(x_k, \theta_\text{best}) \frac{\partial y}{\partial \theta_j}(x_k, \theta_\text{best}) \right). $$ ## Interpretation of the FIM - The FIM tells us how much information the data give us about the model parameters. - Alternatively, the FIM contains two ingredients: - The **sensitivity functions**, given by $$ s_i(x, \theta) = \frac{\partial y}{\partial \theta_i}. $$ Variables that are sensitive to perturbations in a parameter contain a lot of information about that parameter, and will contribute a lot to the FIM (and vice versa). - The **measurement noise** $\sigma^2$. Measurements with lots of noise contain less information about the parameters. ## Approximate confidence region - Level sets of quadratic approximation are ellipsoids. - Good approximation to exact confidence region close to optimum. ![](./images/03a-parameter-estimation/level-sets-ci-quadratic.png){fig-align='center'} ## Variance/covariance of parameters - Often, we want to know variance of individual parameters and covariance between parameters. - Encoded in the error covariance matrix: $$ C = \begin{bmatrix} \sigma^2_{\theta_1} & \text{cov}(\theta_1,\theta_2) & \cdots & \text{cov}(\theta_1,\theta_p) \\ \text{cov}(\theta_2,\theta_1) & \sigma^2_{\theta_2} & \cdots & \text{cov}(\theta_2,\theta_p) \\ \vdots & \vdots & \ddots & \vdots \\ \text{cov}(\theta_p,\theta_1) & \text{cov}(\theta_p,\theta_2) & \cdots & \sigma^2_{\theta_p} \end{bmatrix}. $$ - Diagonal: variances, off-diagonal: covariances - Can be used to construct correlations between between parameters: $$ R_{ij} = \frac{\text{cov}(\theta_i, \theta_j)}{\sigma_{\theta_i} \sigma_{\theta_j}}. $$ ## Computing the error covariance matrix - The inverse of the FIM $\mathcal{I}$ is a lower bound for $C$: $$ C \ge \mathcal{I}^{-1}. $$ - **This is not an obvious result.** - In practice, we just take $\mathcal{I}^{-1}$ as an estimate for $C$. - Approximate confidence interval for parameter $\theta_i$: $$ (\theta_\text{best})_i \pm t_{N-p, 1-\alpha/2} \sqrt{C_{ii}}. $$ ## Worked-out example: logistic model To compute the FIM: - Measurement noise: $\sigma^2 \approx 0.031$ (see earlier). - Sensitivity functions: $$ \frac{\partial y}{\partial A} = \frac{1}{1 + \exp(k(x_\text{mid} - x))}, \quad \frac{\partial y}{\partial k} = \ldots, \quad \frac{\partial y}{\partial x_\text{mid}} = \ldots $$ Often these functions have to be computed **numerically** (see next chapter). ## Logistic model with synthetic data - The inverse of the FIM is given by $$ \mathcal{I}^{-1} = \begin{bmatrix} 0.0057 & -0.0034 & 0.0020 \\ -0.0034 & 0.0169 & -0.0015 \\ 0.0020 & -0.0015 & 0.0040 \\ \end{bmatrix}. $$ - Parameter estimates: $A = 5.359$, $k = 1.597$, $x_\text{mid} = 2.500$. - 95% confidence intervals (low, high): | Parameter | Estimate | Low | High | |-----------|----------|-----|------| | $A$ | 5.36 | 5.20 | 5.52 | | $k$ | 1.60 | 1.32 | 1.87 | | $x_\text{mid}$ | 2.45 | 2.32 | 2.58 | - Correlation between $A$ and $k$: $R = -0.0034/\sqrt{0.057 \times 0.0169} = -0.110$. ## Worked-out example: stock-recruitment model ```{r, include=FALSE} load("datasets/03-parameter-estimation/M.merluccius.rda") ``` Optimal parameters: - $\alpha = 5.75$ - $k = 33.16$ ```{r} #| out-width: 3in #| out-height: 2in #| fig-width: 4.5 #| fig-height: 3 #| fig-align: center bh_fit <- nls( num.fish ~ alpha * spawn.biomass / (1 + spawn.biomass / k), data = M.merluccius, start = list(alpha = 6, k = 20)) ggplot(M.merluccius, aes(spawn.biomass, num.fish)) + geom_point() + geom_function( fun = \(S) predict(bh_fit, newdata = data.frame(spawn.biomass = S)), xlim = c(0, 80)) ``` ## ```{r} J_sr <- function(theta) { x <- M.merluccius$spawn.biomass y <- M.merluccius$num.fish f <- function(S, alpha, k) { alpha * S / (1 + S / k) } resid <- y - f(x, theta[1], theta[2]) sum(resid^2) } J_theta_sr <- J_sr(coef(bh_fit)) sigma2_sr <- J_theta_sr / 13 round2 <- function(x) { format(ceiling(x * 100) / 100,nsmall=2) } ``` 1. Measurement noise: $$ \sigma^2 = \frac{J(\theta_\text{best})}{N - p} = \frac{`r round2(J_theta_sr)`}{13} = `r round2(sigma2_sr)` $$ 1. Sensitivity functions (for Beverton-Holt model): $$ \frac{\partial f}{\partial \alpha} = \frac{S}{1 + S/k}, \quad \frac{\partial f}{\partial k} = - \frac{\alpha S^2}{(k + S)^2}. $$ Again, typically you would compute these derivatives numerically. ## ```{r} # partial derivatives at optimal parameters S <- M.merluccius$spawn.biomass alpha_best <- coef(bh_fit)["alpha"] k_best <- coef(bh_fit)["k"] df_dalpha <- S / (1 + S / k_best) df_dk <- - alpha_best * S^2 / (k_best + S)^2 df <- matrix(c(df_dalpha, df_dk), byrow = TRUE, nrow = 2) # Fisher matrix and inverse FIM_sr <- df %*% t(df) / sigma2_sr FIM_inv_sr <- solve(FIM_sr) ``` 3. Fisher information matrix: $$ \mathcal{I} = \begin{bmatrix} `r round2(FIM_sr[1,1])` & `r round2(FIM_sr[1,2])` \\ `r round2(FIM_sr[2,1])` & `r round2(FIM_sr[2,2])` \\ \end{bmatrix} $$ 4. Error-covariance matrix: $$ C = \mathcal{I}^{-1} = \begin{bmatrix} `r round2(FIM_inv_sr[1,1])` & `r round2(FIM_inv_sr[1,2])` \\ `r round2(FIM_inv_sr[2,1])` & `r round2(FIM_inv_sr[2,2])` \\ \end{bmatrix} $$ ## ```{r, include=FALSE} sigma2_alpha <- FIM_inv_sr[1, 1] sigma2_k <- FIM_inv_sr[2, 2] cov_alpha_k <- FIM_inv_sr[1, 2] R_alpha_k <- cov_alpha_k / sqrt(sigma2_alpha * sigma2_k) ``` From previous slide: $$ \sigma^2_\alpha = `r round2(sigma2_alpha)`, \quad \sigma^2_k = `r round2(sigma2_k)`, \quad \text{Cov}(\alpha, k) = `r round2(cov_alpha_k)`. $$ 5. 95% confidence intervals: - For $\alpha$: $$ \alpha_\text{best} \pm 2.16 \times \sigma_\alpha = [`r round2(alpha_best - 2.16 * sqrt(sigma2_alpha))`, `r round2(alpha_best + 2.16 * sqrt(sigma2_alpha))`] $$ - For $k$: $$ k_\text{best} \pm 2.16 \times \sigma_k = [`r round2(k_best - 2.16 * sqrt(sigma2_k))`, `r round2(k_best + 2.16 * sqrt(sigma2_k))`] $$ 6. Parameter covariance: $$ R = \frac{\text{Cov}(\alpha, k)}{\sigma_\alpha \times \sigma_k} = \frac{`r round2(cov_alpha_k)`}{`r round2(sqrt(sigma2_alpha))` \times `r round2(sqrt(sigma2_k))`} = `r round2(R_alpha_k)`. $$ ## Spaghetti plot To get an idea of the variability in the confidence region, sample parameters from it, and plot resulting fitted curves. ```{r} #| out-width: 4.5in #| out-height: 2in #| fig-width: 7 #| fig-height: 3 #| fig-align: center library(mvtnorm) parameters <- rmvnorm(100, mean = c(alpha_best, k_best), sigma = FIM_inv_sr) beverton_holt <- function(S, theta) { theta[1] * S / (1 + S / theta[2]) } p <- ggplot(parameters, aes(x = alpha, y = k)) + geom_point() q <- ggplot(M.merluccius, aes(spawn.biomass, num.fish)) + geom_point() for (i in seq_along(numeric(nrow(parameters)))) { q <- q + geom_function(fun = beverton_holt, args = list(theta = parameters[i, ]), color = "red", alpha = 0.1) } q <- q + geom_function(fun = beverton_holt, args = list(theta = c(alpha_best, k_best)), color = "red", alpha = 1.0, linewidth = 1.0) grid.arrange(p, q, ncol = 2) ``` ## Key takeaways - Quality of parameter estimates depends on **model** and **data**, encoded by the FIM. - The FIM provides a way of drawing elliptical confidence regions in parameter space. - The FIM gives a lower bound for the error-covariance matrix.