[population_ssm] New lecture: Bayesian state-space models (linear filtering to nonlinear population dynamics)

## Goal

Add a new lecture, **`population_ssm.md`** — *Bayesian State-Space Models: From Linear Filtering to Nonlinear Population Dynamics* — teaching Bayesian estimation of state-space models with NumPyro/NUTS, progressing from a linear-Gaussian null (exact Kalman benchmark) to a nonlinear logistic (Ricker) population model.

It builds on the existing [`kalman.md`](https://github.com/QuantEcon/lecture-python.myst/blob/main/lectures/kalman.md) (linear Gaussian theory) and the NumPyro lectures ([`ar1_bayes.md`](https://github.com/QuantEcon/lecture-python.myst/blob/main/lectures/ar1_bayes.md), [`bayes_nonconj.md`](https://github.com/QuantEcon/lecture-python.myst/blob/main/lectures/bayes_nonconj.md)), and shares machinery with the planned `unemployment_bayes.md` lecture (#910). Internally it is also a stepping stone toward latent-state fisheries/stock-assessment models, but that stays an implicit closing note — **not** an explicit theme of the finished lecture.

## Origin

Drafted from `population_ssm_lecture.tex` (faithful MyST conversion attached as `population_ssm_lecture.md`, with corrections flagged inline). The draft is strong; the model is stable and standard and the pedagogical arc is sound. A few technical errors and scope trims are settled below.

## Review findings (settled)

Verified by simulation where relevant.

1. **Steady-state Kalman gain formula is wrong.** The draft states the steady-state gain is $\sigma_p^2/(\sigma_p^2+\sigma_o^2)=q/(1+q)$. It is not. It solves the algebraic Riccati equation $K^2+qK-q=0$, giving $K=\tfrac12(\sqrt{q^2+4q}-q)$ (equivalently $q=K^2/(1-K)$). At $q=1$ this is the golden ratio $0.618$, not $0.5$. Confirmed numerically:
   ```
     q     K_numeric   q/(1+q)
    1.00   0.61803    0.50000
    4.00   0.82843    0.80000
    0.10   0.27016    0.09091
   ```
2. **Signal-to-noise identification mischaracterized.** The draft claims a constant-total-variance ridge. Differencing gives an MA(1): $\mathrm{Var}(\Delta y)=\sigma_p^2+2\sigma_o^2$, lag-1 autocov $=-\sigma_o^2$, so both variances are identified **asymptotically**. The real issue is **finite-sample** weak identification + the boundary $\sigma_p\to0$ (MA unit-root / pile-up) — which is exactly where the funnel appears.
3. **The `lax.scan` NumPyro code is broken.** `numpyro.sample(f"x_{t}", ...)` inside `jax.lax.scan` does not work (effect handlers don't see sample sites in a raw `lax.scan`; the name can't be an f-string of a traced index). Fix = non-centred parameterisation from the start (sample innovations in a `plate`, build the path with a *deterministic* `lax.scan`), which also pre-empts the funnel.
4. **Name the model.** The discrete log-space logistic **is the Ricker model** — name it (avoids confusion with the chaotic logistic *map*). Note it is deterministically stable for $r<2$, well outside the `HalfNormal(0.3)` prior (confirmed; prior-predictive: 100% of 3000 draws plausible).
5. **Notation clash:** $q$ = signal-to-noise ratio, then $q$ = catchability in the fisheries note. Rename one (use $\rho$ for SNR).
6. **Catch term is an approximation:** $-C/N$ is the small-catch linearisation of $\log(1-C/N)$ (undefined at $C\ge N$); exact Schaefer is multiplicative on the biomass scale.

## Agreed design

- **Lean linear section as the null + validation.** Use random-walk-**with-drift**, $\log N_t=\log N_{t-1}+\mu+\eta_t$ — the density-independent exponential-growth null (Dennis et al. 2006) — instead of the driftless local level. It is more plausible *and* the proper baseline, making linear→nonlinear a real nested "is there density dependence (a finite $K$)?" question. **Do not re-derive the Kalman filter** — link to `kalman.md` and use it purely as the exact benchmark NUTS must match. This section earns its place as the only case with an exact analytical answer to validate the ~40-dim latent-trajectory sampler against.
- **Nonlinear core = Ricker model** with corrected naming, stability note, and the corrected gain/identification framing.
- **Non-centred implementation from the start** (fixes the broken code + pre-empts the funnel); show the centred version only to motivate the divergences in the diagnostics section.
- **Keep:** observation-vs-process-error framing; SNR ($\rho$); recovering the Kalman **smoother** (not the filter) from NUTS; the $r$–$K$ correlation ridge as a callback to the $(\beta,\lambda)$ ridge in `unemployment_bayes`; simulation-study-before-real-data; funnel/non-centred; Student-$t$ robustness; lynx as the deliberate PPC-failure example.
- **Trim:** fisheries → brief implicit "where we're heading" note (no policy/CVaR layer); PMMH → one-line pointer; alternatives table → one sentence (keep wolves + lynx).

## Section plan

1. Introduction & motivation (population ecology; obs-vs-process error; trim fisheries refs)
2. State-space framework (transition/observation; joint posterior over $\theta$ and $x_{1:T}$; define $\rho$)
3. Linear **null**: density-independent growth (RW+drift) — concepts on tractable ground; corrected Kalman gain; corrected identification; **validate NUTS vs exact Kalman smoother**
4. Nonlinear **Ricker** model — nested density-dependence question; why Kalman fails; reparam; stability note
5. Bayesian formulation & **non-centred** NumPyro implementation
6. Diagnostics — $\hat R$/ESS/divergences; funnel ↔ $\sigma_p\to0$ boundary; PPC (lynx misspecification)
7. Posterior analysis, questions & exercises ($r$–$K$ ridge; prior predictive; Student-$t$)
8. Data — Yellowstone wolves (default) + lynx; simulation study
9. Brief outlook note (catch term → fisheries; implicit)

## Tasks

- [ ] Prototype the linear (RW+drift) model; confirm NUTS posterior matches the Kalman smoother on the same data
- [ ] Prototype the Ricker SSM with the **non-centred** parameterisation; confirm no divergences
- [ ] Reproduce the centred-vs-non-centred funnel/divergence contrast for the diagnostics section
- [ ] Tune priors via prior-predictive checks (record the workflow)
- [ ] Enter/locate the Yellowstone wolf series (CSV) + lynx series
- [ ] Simulation study (recover known parameters; coverage of 90% CIs)
- [ ] $r$–$K$ ridge plot; latent-trajectory credible bands; PPC (incl. lynx failure)
- [ ] Draft `lectures/population_ssm.md` in MyST (GPU admonition include; numpyro/jax/arviz imports), with corrected gain formula and Ricker naming
- [ ] Add to `lectures/_toc.yml` near the Bayesian/Kalman lectures
- [ ] Cross-link `kalman.md`, `ar1_bayes.md`, `bayes_nonconj.md`, `unemployment_bayes.md`
- [ ] Verify execution (jupytext → py) and check runtime budget

<details>
<summary>Original draft source — <code>population_ssm_lecture.tex</code></summary>

(Embedded for self-containment. John may also drag-drop the converted `population_ssm_lecture.md` onto this issue.)

```latex
\documentclass[12pt]{article}
\usepackage{amsmath, amssymb, amsthm}
\usepackage{geometry}
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{enumitem}
\usepackage{xcolor}
\usepackage{natbib}

\geometry{margin=1.2in}

\hypersetup{
    colorlinks=true,
    linkcolor=blue!60!black,
    citecolor=blue!60!black,
    urlcolor=blue!60!black
}

\newtheorem{remark}{Remark}
\newtheorem{example}{Example}

\title{\textbf{Bayesian State Space Models: From Linear Filtering to
Nonlinear Population Dynamics}\\
\large{Notes Toward a QuantEcon Lecture}}
\author{John Stachurski}
\date{\today}

\begin{document}

\maketitle
\tableofcontents
\bigskip

%=============================================================
\section{Introduction and Motivation}
%=============================================================

\subsection{Overview}

This lecture introduces Bayesian estimation of state space models,
progressing from a simple linear Gaussian specification — where exact
analytical results are available — to a nonlinear, non-Gaussian model
of animal population dynamics.  The nonlinear model is a logistic
growth state space system: essentially a single-species surplus
production model without harvesting.  It is a direct on-ramp to the
Bayesian fisheries stock assessment methods developed in the companion
lecture.

The lecture has three interlocking goals.  The first is conceptual:
to establish the distinction between \emph{observation error} and
\emph{process error}, and to show how a state space model separates
the two.  The second is analytical: to derive the Kalman filter for
the linear Gaussian case and understand what it computes.  The third
is computational: to implement the same model in NumPyro, verify that
NUTS recovers the Kalman solution, and then extend to the nonlinear
case where the Kalman filter is no longer exact.

\subsection{Why animal population dynamics?}

Population ecology provides an ideal setting for introducing state
space models to an economics audience.

\begin{itemize}[itemsep=4pt]

  \item \textbf{The latent state is genuinely unobserved.}  We never
        count every animal in a population.  Survey data (aerial
        counts, camera traps, mark-recapture experiments) are noisy
        proxies for the true population size.  The latent state
        interpretation is not a modelling convenience — it is the
        literal situation.

  \item \textbf{The dynamics are economically familiar.}  Logistic
        growth is the simplest model of a renewable resource with a
        carrying capacity.  It is the biological analogue of
        diminishing returns, and it underpins the entire literature on
        optimal harvesting and fisheries management.

  \item \textbf{The signal-to-noise problem is vivid.}  Wildlife
        surveys are expensive and infrequent; observation errors are
        large.  Students immediately grasp why it matters to
        distinguish a population that is genuinely declining from one
        that merely \emph{appears} to be declining because of noisy
        counts.

  \item \textbf{Real data are short and clean.}  Annual wolf, elk, or
        lynx count series from national parks typically have 30--60
        observations, which is ideal for a notebook: NUTS runs fast,
        the model is identified, and the posterior is interpretable.

\end{itemize}

\subsection{Position in the lecture sequence}

This lecture sits between the QuantEcon lectures on the Kalman filter
(linear Gaussian theory) and the fisheries surplus production lecture
(nonlinear state space with harvesting and a policy layer).  It
serves two bridging functions.  First, it shows students that the
Kalman filter is a special case of Bayesian filtering, recoverable
from NUTS as a limiting case.  Second, it introduces the logistic
growth process that reappears — with a catch term added — as the
Schaefer surplus production model in the fisheries lecture.  The
computational tools (NumPyro model, NUTS, ArviZ diagnostics) are
identical across lectures, so the marginal cost of learning the
fisheries model after this one is low.

%=============================================================
\section{State Space Models: The Basic Framework}
%=============================================================

\subsection{Two sources of uncertainty}

A central theme of this lecture is that time series data conflate two
distinct sources of randomness that have very different implications
for inference.

\textbf{Process error} (also called system noise or ecological
stochasticity in biology) reflects genuine randomness in the
underlying dynamics: random births and deaths, environmental
fluctuations that affect reproduction, demographic accidents in small
populations.  Process error means the true state of the system is
uncertain even to the system itself; the uncertainty accumulates over
time and cannot be reduced by better observation.

\textbf{Observation error} (also called measurement noise) reflects
imperfections in how we observe the state: sampling variance in
surveys, detection probability less than one, counting errors.
Observation error is in principle reducible by investing in better
data collection; it does not reflect genuine uncertainty about future
dynamics.

Conflating the two leads to badly calibrated forecasts and policy
advice.  A model that attributes observation noise to process
variability will overestimate how unpredictable the system is.  A
model that treats genuine process noise as a measurement error will
overstate how precisely we can predict the future.  The state space
framework keeps them separate by construction.

\subsection{General state space form}

A state space model has two equations.  The \emph{transition
equation} (or process equation) describes how the latent state
$x_t$ evolves:
\begin{equation}
  x_t = f(x_{t-1},\, \eta_t), \qquad \eta_t \sim p_\eta,
  \label{eq:transition}
\end{equation}
where $f$ is a (possibly nonlinear) function and $\eta_t$ is process
noise.  The \emph{observation equation} (or measurement equation)
describes how the observed data $y_t$ relate to the latent state:
\begin{equation}
  y_t = g(x_t,\, \epsilon_t), \qquad \epsilon_t \sim p_\epsilon,
  \label{eq:observation}
\end{equation}
where $g$ is a (possibly nonlinear) function and $\epsilon_t$ is
observation noise.  The latent states $x_{1:T} = (x_1,\ldots,x_T)$
are never directly observed.

\paragraph{What Bayesian inference does.}  Given observed data
$y_{1:T}$ and a prior over parameters $\theta$, we want the joint
posterior:
\[
  p(\theta,\, x_{1:T} \mid y_{1:T})
  \;\propto\;
  p(y_{1:T} \mid x_{1:T}, \theta)\;
  p(x_{1:T} \mid \theta)\;
  p(\theta).
\]
This is a joint distribution over both the parameters and the
entire latent state trajectory.  NUTS samples from this joint
posterior directly, treating $x_{1:T}$ as a high-dimensional
latent variable alongside $\theta$.

%=============================================================
\section{The Linear Gaussian Case}
%=============================================================

\subsection{Model specification}

The simplest state space model is the \emph{local level model}:
\begin{align}
  x_t   &= x_{t-1} + \eta_t, \qquad \eta_t \sim \mathcal{N}(0,\,\sigma_p^2)
           \label{eq:ll_transition} \\[4pt]
  y_t   &= x_t + \epsilon_t, \qquad \epsilon_t \sim \mathcal{N}(0,\,\sigma_o^2).
           \label{eq:ll_observation}
\end{align}
The latent state $x_t$ is a random walk; the observed series $y_t$
is a noisy measurement of $x_t$.  Parameters are
$\theta = (\sigma_p, \sigma_o, x_0)$ where $x_0$ is the initial
state (or its prior mean).

For the population dynamics application, take $x_t = \log N_t$ where
$N_t$ is population size, and $y_t = \log \hat N_t$ where $\hat N_t$
is the survey count.  Then the local level model says:
\begin{itemize}
  \item The log-population drifts randomly (geometric random walk);
  \item Survey counts are a log-normal perturbation of the true
        population.
\end{itemize}
This is the right starting point — it captures the log-normal
observation structure without the complications of density-dependent
dynamics.

\subsection{The signal-to-noise ratio}

Define the \emph{signal-to-noise ratio}
\begin{equation}
  q \;=\; \frac{\sigma_p^2}{\sigma_o^2}.
  \label{eq:snr}
\end{equation}
This single parameter determines the character of the filtered
estimate.

\begin{itemize}[itemsep=4pt]

  \item When $q \to 0$: process noise is negligible relative to
        observation noise.  The filter heavily smooths the data,
        placing most weight on the prior trajectory.  The filtered
        estimate is a very smooth trend.

  \item When $q \to \infty$: observation noise is negligible.  The
        filter trusts each new observation almost completely.  The
        filtered estimate tracks the data closely.

  \item For intermediate $q$: the filter strikes a balance, giving a
        smoothed estimate that neither ignores the data nor tracks it
        slavishly.

\end{itemize}

In population ecology, $q$ is typically small: surveys are noisy and
genuine year-to-year fluctuations in abundance are moderate.  One of
the most important lessons of the lecture is that $q$ is often
\emph{weakly identified}: data can be consistent with a wide range of
$(\sigma_p, \sigma_o)$ pairs having the same total variance
$\sigma_p^2 + \sigma_o^2$.  The posterior of $q$ will often be wide,
and the posterior of the latent trajectory will respond
correspondingly to the assumed $q$.

\subsection{The Kalman filter}

For the linear Gaussian model \eqref{eq:ll_transition}--\eqref{eq:ll_observation},
the posterior of $x_t$ given $y_{1:t}$ is Gaussian at every $t$.
The Kalman filter propagates the mean $m_t = \mathbb{E}[x_t \mid
y_{1:t}]$ and variance $P_t = \mathrm{Var}(x_t \mid y_{1:t})$
recursively.

\paragraph{Predict step.}
\begin{align}
  m_{t|t-1} &= m_{t-1}   \label{eq:kf_predict_mean} \\
  P_{t|t-1} &= P_{t-1} + \sigma_p^2.   \label{eq:kf_predict_var}
\end{align}

\paragraph{Update step.}
\begin{align}
  K_t   &= \frac{P_{t|t-1}}{P_{t|t-1} + \sigma_o^2}
            \label{eq:kf_gain} \\[4pt]
  m_t   &= m_{t|t-1} + K_t (y_t - m_{t|t-1})
            \label{eq:kf_update_mean} \\[4pt]
  P_t   &= (1 - K_t)\, P_{t|t-1}.
            \label{eq:kf_update_var}
\end{align}

The scalar $K_t \in (0,1)$ is the \emph{Kalman gain}.  It equals
$\sigma_p^2 / (\sigma_p^2 + \sigma_o^2)$ at steady state, which is
exactly the signal-to-noise ratio rescaled to $(0,1)$.  When $q$ is
large, $K_t \approx 1$ and the filter trusts the new observation.
When $q$ is small, $K_t \approx 0$ and the filter discounts it.

\paragraph{Marginal likelihood.}
The Kalman filter also produces the \emph{innovation}
$\nu_t = y_t - m_{t|t-1}$ and its variance $S_t = P_{t|t-1} +
\sigma_o^2$.  The log marginal likelihood (integrating out $x_{1:T}$)
is:
\begin{equation}
  \log p(y_{1:T} \mid \theta)
  \;=\;
  -\frac{1}{2}\sum_{t=1}^{T}
  \left[\log(2\pi S_t) + \frac{\nu_t^2}{S_t}\right].
  \label{eq:kf_loglik}
\end{equation}
This is differentiable in $\theta = (\sigma_p, \sigma_o)$, so maximum
likelihood estimation is straightforward.

\subsection{Recovering the Kalman solution with NUTS}

Before introducing the nonlinear model, it is important to verify
that NUTS gives the same answer as the Kalman filter on the linear
model.  This serves two purposes: it validates the NumPyro
implementation, and it builds student trust in the sampler before
we move to settings where analytical benchmarks are unavailable.

The NumPyro model for the local level specification treats $x_{1:T}$
as explicit latent variables:

\begin{verbatim}
import numpyro
import numpyro.distributions as dist
import jax.numpy as jnp

def local_level(y):
    T = len(y)
    sigma_p = numpyro.sample("sigma_p", dist.HalfNormal(1.0))
    sigma_o = numpyro.sample("sigma_o", dist.HalfNormal(1.0))
    x0      = numpyro.sample("x0",      dist.Normal(y[0], 2.0))

    x = [x0]
    for t in range(1, T):
        x_t = numpyro.sample(
            f"x_{t}",
            dist.Normal(x[t-1], sigma_p)
        )
        x.append(x_t)

    x_arr = jnp.stack(x)
    with numpyro.plate("obs", T):
        numpyro.sample("y", dist.Normal(x_arr, sigma_o), obs=y)
\end{verbatim}

\begin{remark}
  The loop-based formulation above is pedagogically clear but
  computationally inefficient for long series because JAX traces
  through each iteration.  For production code, use
  \texttt{numpyro.contrib.control\_flow.scan} to vectorise the
  recursion.  For a short dataset (say $T = 50$) the loop is fast
  enough for a lecture notebook.
\end{remark}

The posterior marginal for $x_t$ from NUTS should match the Kalman
smoother distribution (not just the filter) since NUTS conditions on
the full data $y_{1:T}$.  Plot both on the same axis as a
verification exercise.

%=============================================================
\section{The Nonlinear Model: Logistic Population Growth}
%=============================================================

\subsection{Model specification}

The local level model assumes the population follows a geometric
random walk, which allows unbounded growth and ignores density
dependence.  A more realistic description introduces a carrying
capacity $K$: as the population approaches $K$, growth slows and
eventually reverses.  The standard continuous-time model is the
logistic equation; its discrete-time analogue in log-space is:

\begin{align}
  \log N_t   &= \log N_{t-1}
               + r\!\left(1 - \frac{N_{t-1}}{K}\right)
               + \eta_t,
               \qquad \eta_t \sim \mathcal{N}(0,\,\sigma_p^2)
               \label{eq:pop_transition} \\[6pt]
  \log y_t   &= \log N_t + \epsilon_t,
               \qquad \epsilon_t \sim \mathcal{N}(0,\,\sigma_o^2).
               \label{eq:pop_observation}
\end{align}

Here $N_t > 0$ is the true (latent) population at time $t$; $y_t > 0$
is the observed survey count; $r > 0$ is the intrinsic growth rate;
$K > 0$ is the carrying capacity; $\sigma_p$ and $\sigma_o$ are the
process and observation noise standard deviations.

\paragraph{Interpretation of equation~\eqref{eq:pop_transition}.}
When $N_{t-1} \ll K$, the term $(1 - N_{t-1}/K) \approx 1$ and the
population grows at approximately rate $r$ per period.  When
$N_{t-1} = K$, growth is exactly zero on average.  When $N_{t-1} >
K$, the growth rate is negative, pulling the population back down.
The process noise $\eta_t$ captures environmental stochasticity —
random variation in birth rates, survival, and habitat quality.

\paragraph{Interpretation of equation~\eqref{eq:pop_observation}.}
Survey counts $y_t$ are log-normally distributed around the true
population $N_t$.  Equivalently, $y_t = N_t \cdot e^{\epsilon_t}$,
so the observation error is multiplicative — a common model for count
data where variance scales with mean.  The standard deviation
$\sigma_o$ on the log scale controls the coefficient of variation of
the survey counts.

\subsection{Why the Kalman filter fails here}

The Kalman filter requires both the transition and observation
equations to be \emph{linear} in the state.
Equation~\eqref{eq:pop_transition} is nonlinear: the term $r
N_{t-1}/K$ makes $\log N_t$ depend nonlinearly on $N_{t-1} =
\exp(\log N_{t-1})$.  Specifically, if we let $x_t = \log N_t$, the
transition becomes
\[
  x_t \;=\; x_{t-1} + r\!\left(1 - \frac{e^{x_{t-1}}}{K}\right) + \eta_t,
\]
which is not affine in $x_{t-1}$.  The Extended Kalman Filter (EKF)
linearises this around the current estimate, but the approximation
can be poor when the population is far from the carrying capacity or
when $\sigma_p$ is large.  The Unscented Kalman Filter (UKF) does
better but still imposes Gaussian approximations.  NUTS treats the
nonlinearity exactly, making it the right tool once we leave the
linear Gaussian world.

\subsection{Reparameterisation}

It is convenient to work with log-transformed parameters and
states throughout.  Define:
\[
  x_t = \log N_t, \qquad
  \kappa = \log K.
\]
The model becomes:
\begin{align}
  x_t   &= x_{t-1}
           + r\!\left(1 - e^{x_{t-1} - \kappa}\right)
           + \eta_t
           \label{eq:logpop_transition} \\[6pt]
  \log y_t &= x_t + \epsilon_t.
           \label{eq:logpop_observation}
\end{align}
This formulation has several computational advantages: $x_t$ ranges
over all of $\mathbb{R}$, the parameters $r$ and $e^\kappa = K$ are
unconstrained after appropriate transformation, and the observation
equation is exactly linear in $x_t$.  NUTS operates on the
unconstrained space, and NumPyro's \texttt{dist.TransformedDistribution}
handles the Jacobian automatically when sampling log-scale parameters.

\subsection{Parameters and their roles}

\begin{center}
\begin{tabular}{llll}
\toprule
Parameter & Symbol & Units & Typical range \\
\midrule
Intrinsic growth rate & $r$        & per period & $(0.05,\, 0.5)$ \\
Carrying capacity     & $K$        & individuals & application-specific \\
Process noise         & $\sigma_p$ & log-scale   & $(0.02,\, 0.3)$ \\
Observation noise     & $\sigma_o$ & log-scale   & $(0.05,\, 0.5)$ \\
Initial log-population & $x_0$     & log-scale   & $\approx \log K$ \\
\bottomrule
\end{tabular}
\end{center}

The signal-to-noise ratio $q = \sigma_p^2 / \sigma_o^2$ plays the
same central role as in the local level model, now compounded by the
additional identification challenge between $r$ and $K$: only the
equilibrium $K$ and the speed of adjustment $r$ together determine
the long-run dynamics, and the two interact in a way that makes their
joint posterior correlated.

%=============================================================
\section{Bayesian Formulation}
%=============================================================

\subsection{Likelihood}

Conditional on parameters $\theta = (r, K, \sigma_p, \sigma_o, x_0)$
and latent states $x_{1:T}$, the observations $\log y_t$ are
independent:
\[
  \log y_t \mid x_t,\,\theta
  \;\sim\;
  \mathcal{N}(x_t,\; \sigma_o^2).
\]
The latent states evolve according to:
\[
  x_t \mid x_{t-1},\,\theta
  \;\sim\;
  \mathcal{N}\!\left(
    x_{t-1} + r\!\left(1 - e^{x_{t-1}-\kappa}\right),\;
    \sigma_p^2
  \right).
\]
The joint log-likelihood of the latent path and observations is:

\begin{equation}
  \log p(y_{1:T}, x_{1:T} \mid \theta)
  \;=\;
  -\frac{T}{2}\log(2\pi\sigma_o^2)
  -\frac{1}{2\sigma_o^2}\sum_{t=1}^T (\log y_t - x_t)^2
  -\frac{T}{2}\log(2\pi\sigma_p^2)
  -\frac{1}{2\sigma_p^2}\sum_{t=1}^T \bigl[x_t - x_{t-1}
       - r(1-e^{x_{t-1}-\kappa})\bigr]^2.
\end{equation}

NUTS samples from $p(\theta, x_{1:T} \mid y_{1:T}) \propto
p(y_{1:T}, x_{1:T} \mid \theta) \cdot p(\theta)$ jointly.

\subsection{Prior distributions}

\begin{align}
  r          &\;\sim\; \mathrm{HalfNormal}(0.3)
               \tag{P1}\label{prior:r} \\[4pt]
  K          &\;\sim\; \mathrm{LogNormal}(\mu_K,\, 0.5^2)
               \tag{P2}\label{prior:K} \\[4pt]
  \sigma_p   &\;\sim\; \mathrm{HalfNormal}(0.2)
               \tag{P3}\label{prior:sigmap} \\[4pt]
  \sigma_o   &\;\sim\; \mathrm{HalfNormal}(0.3)
               \tag{P4}\label{prior:sigmao} \\[4pt]
  x_0        &\;\sim\; \mathcal{N}(\log \hat{N}_1,\; 1.0^2)
               \tag{P5}\label{prior:x0}
\end{align}

\paragraph{Notes on prior choices.}

\begin{itemize}[itemsep=4pt]

  \item \textbf{Growth rate \eqref{prior:r}.}  A $\mathrm{HalfNormal}(0.3)$
        prior places most mass on $r \in (0, 0.6)$.  For large
        mammals (wolves, elk), typical intrinsic growth rates are
        $r \in (0.1, 0.4)$ per year.  For fish stocks $r$ can be
        larger.  The prior should be adjusted to the application;
        life-history databases such as FishBase or the RAM Legacy
        Database provide useful reference points.

  \item \textbf{Carrying capacity \eqref{prior:K}.}  The
        log-Normal prior is natural for a positive quantity.  Set
        $\mu_K$ to the log of a rough prior estimate of maximum
        sustainable population, derived from survey history or
        habitat assessments.  The scale $0.5$ on the log scale
        corresponds to a factor-of-1.6 uncertainty, which is
        deliberately wide.

  \item \textbf{Process and observation noise
        \eqref{prior:sigmap}--\eqref{prior:sigmao}.}
        Both are on the log scale.  A $\mathrm{HalfNormal}(0.2)$ prior
        on $\sigma_p$ assigns most mass to year-to-year log-scale
        fluctuations below about $40\%$, which is appropriate for
        large mammals.  The key prior sensitivity exercise is to vary
        the ratio of these scales and show how the filtered trajectory
        responds.

  \item \textbf{Initial state \eqref{prior:x0}.}  Centred on the
        log of the first observation, with a standard deviation of
        1.0 on the log scale.  This is intentionally vague — the
        first survey count may be far from the true initial
        population.

\end{itemize}

\subsection{NumPyro implementation}

\begin{verbatim}
import numpyro
import numpyro.distributions as dist
import jax.numpy as jnp
from jax import lax

def logistic_ssm(log_y):
    T = len(log_y)

    # Priors (all unconstrained or half-constrained)
    r       = numpyro.sample("r",       dist.HalfNormal(0.3))
    log_K   = numpyro.sample("log_K",   dist.Normal(jnp.mean(log_y), 1.0))
    sigma_p = numpyro.sample("sigma_p", dist.HalfNormal(0.2))
    sigma_o = numpyro.sample("sigma_o", dist.HalfNormal(0.3))
    x0      = numpyro.sample("x0",      dist.Normal(log_y[0], 1.0))

    # Latent state trajectory via scan
    def transition(x_prev, t):
        drift = r * (1.0 - jnp.exp(x_prev - log_K))
        x_t   = numpyro.sample(
            f"x_{t}",
            dist.Normal(x_prev + drift, sigma_p)
        )
        return x_t, x_t

    _, x_vals = lax.scan(
        lambda carry, t: transition(carry, t),
        x0,
        jnp.arange(1, T)
    )
    x_all = jnp.concatenate([x0[None], x_vals])

    # Observation model
    with numpyro.plate("obs", T):
        numpyro.sample("y", dist.Normal(x_all, sigma_o), obs=log_y)
\end{verbatim}

\begin{remark}
  The use of \texttt{lax.scan} replaces the Python loop from the
  local level implementation.  This is important for performance:
  \texttt{lax.scan} compiles the recursion into a single XLA
  operation, making it fast even for $T = 200$.  The tradeoff is
  that NumPyro's named-sample approach inside \texttt{scan} requires
  care — consult the NumPyro documentation on scan and
  \texttt{numpyro.contrib.control\_flow} for production-quality
  implementations.
\end{remark}

%=============================================================
\section{Data}
%=============================================================

\subsection{Yellowstone wolves}

The canonical dataset for this lecture is the \textbf{Yellowstone
wolf population}, monitored continuously since the reintroduction of
17 wolves in 1995.  Annual population estimates are published by the
Yellowstone Wolf Project (National Park Service) and are freely
available.  The series runs from 1995 to the present, giving
approximately 30 observations, and shows textbook logistic growth:
rapid early expansion from 17 to a peak of 174 in 2003, followed by
stabilisation and modest decline.

\begin{itemize}
  \item \textbf{Source:}
        \url{https://www.nps.gov/yell/learn/nature/wolves.htm}
  \item \textbf{Format:}  Annual total population count.  Copy the
        table into a CSV; it is short enough to enter by hand.
  \item \textbf{Notes:}  The counts are intensive survey estimates,
        not raw observations.  The observation error $\sigma_o$ therefore
        reflects survey methodology rather than raw counting variance.
\end{itemize}

This dataset is ideal because: (a) the logistic growth story is
visually obvious in the data; (b) the carrying capacity $K$ is
meaningfully estimable — the population clearly levelled off; (c) the
series is short enough that NUTS runs in seconds; and (d) it is a
genuinely interesting ecological and policy story.

\subsection{Alternative datasets}

\begin{center}
\begin{tabular}{lllp{5.5cm}}
\toprule
Population & Source & $T$ & Notes \\
\midrule
Yellowstone elk    & NPS Yellowstone & $\sim$40 & Predator--prey context; longer series \\
Canadian lynx      & MacKenzie (1820--1934) & 114 & Classic ecological time series; shows cycles \\
Whooping crane     & USFWS & $\sim$80 & Critically endangered; strong conservation interest \\
Harbour seal (UK)  & SCOS reports & $\sim$40 & Direct fisheries relevance \\
Pacific salmon run & NOAA / PSA & $\sim$60 & Direct link to fisheries lecture \\
\bottomrule
\end{tabular}
\end{center}

The \textbf{Canadian lynx} series is particularly interesting for
a lecture on nonlinear dynamics because the population shows
predator--prey oscillations that a simple logistic model cannot
capture.  This makes it a useful example of \emph{model
misspecification} — the posterior predictive checks will fail
visibly, motivating extensions.

\subsection{Simulated data as a pedagogical tool}

Before fitting to real data, it is valuable to fit the model to
data simulated from known parameters.  This \emph{simulation study}
accomplishes three things:

\begin{enumerate}
  \item It verifies that the model is identifiable: do the posterior
        medians recover the true parameters?
  \item It reveals which parameters are well-identified and which are
        weakly identified (wide posteriors even with infinite data
        would signal structural non-identification).
  \item It gives students intuition for what the model produces
        before they see real data.
\end{enumerate}

A typical simulation exercise: fix $(r, K, \sigma_p, \sigma_o) =
(0.2, 150, 0.1, 0.2)$ and $N_0 = 20$.  Simulate $T = 40$ years of
data.  Estimate the model.  Report the $90\%$ credible intervals for
each parameter and check whether the true value is contained.

%=============================================================
\section{Interesting Questions and Exercises}
%=============================================================

\subsection{Substantive questions}

\begin{enumerate}[itemsep=6pt]

  \item \textbf{What is the carrying capacity?}  The posterior of $K$
        gives a full probability distribution over the maximum
        sustainable population.  For Yellowstone wolves, this has
        direct management implications: how many wolves can
        Yellowstone sustain?  Compare the posterior to published
        habitat-based estimates.

  \item \textbf{What is the intrinsic growth rate?}  The posterior
        of $r$ answers this.  For Yellowstone wolves, published
        estimates suggest $r \approx 0.25$ per year in the early
        reintroduction period.  Does the posterior recover this?

  \item \textbf{What is the true population trajectory?}  The
        posterior of $x_{1:T}$ (equivalently, $N_{1:T}$) gives a
        probability distribution over the latent population path.
        Plot the posterior median and $90\%$ credible band alongside
        the observed counts.  Years with few survey visits will have
        wide bands; years with intensive surveying will have narrow
        bands.

  \item \textbf{Is the population currently below or above carrying
        capacity?}  Compute $P(N_T < K \mid y_{1:T})$ from the
        posterior.  This is the kind of probabilistic statement
        that managers need and that a point estimate cannot provide.

  \item \textbf{How much of the variance is observation error?}  The
        ratio $\sigma_o^2 / (\sigma_p^2 + \sigma_o^2)$ is the
        fraction of total variance attributable to measurement.
        Plot its posterior distribution.  A value near 1 means most
        of what we see in the data is noise; a value near 0 means the
        observed fluctuations are genuine.

\end{enumerate}

\subsection{Teaching Bayesian foundations}

\begin{enumerate}[itemsep=6pt]

  \item \textbf{The signal-to-noise identification problem.}
        This is the central pedagogical point of the lecture.  Fix
        the total variance $\sigma_p^2 + \sigma_o^2$ and vary $q =
        \sigma_p^2/\sigma_o^2$.  Show how the inferred latent
        trajectory changes: high $q$ gives a wiggly trajectory that
        tracks observations closely; low $q$ gives a smooth trajectory
        that largely ignores observation-level fluctuations.  Then
        show that the posterior of $q$ is wide — the data often
        cannot sharply determine how much of the variance is process
        versus observation.  This is not a failure of the model; it
        is an honest representation of what the data can and cannot
        tell us.

  \item \textbf{Prior predictive simulation.}  Before looking at
        data, sample $(r, K, \sigma_p, \sigma_o)$ from the prior and
        simulate population trajectories.  Are the prior trajectories
        ecologically plausible?  Do they include populations that
        go extinct in the first year, or grow to a billion?  Prior
        predictive simulation is the right way to diagnose
        unreasonable priors before fitting.

  \item \textbf{Marginalising versus conditioning on latent states.}
        Advanced students can compare two approaches: (a) NUTS
        sampling the joint posterior over $(\theta, x_{1:T})$, and
        (b) using a particle Marginal Metropolis--Hastings (PMMH)
        algorithm that marginalises over $x_{1:T}$ using a particle
        filter, sampling only over $\theta$.  Approach (b) is more
        efficient in high dimensions but more complex to implement.
        For $T \leq 50$ and the model above, approach (a) is
        perfectly adequate.

  \item \textbf{Identifiability and the $r$--$K$ trade-off.}
        The equilibrium population is $N^* \approx K$ (ignoring
        stochasticity).  The speed of approach to equilibrium depends
        on $r$.  But if $T$ is short and the population never gets
        close to $K$, then $K$ and $r$ are only weakly jointly
        identified: a fast growth rate combined with a high carrying
        capacity can look similar to a slow growth rate with a
        moderate carrying capacity over a short time horizon.  The
        joint posterior of $(r, K)$ will show this as a correlation
        ridge, analogous to the $(\beta, \lambda)$ ridge in the
        unemployment lecture.

  \item \textbf{Non-Gaussian extensions.}  Replace the Gaussian
        observation noise with a Student-$t$ distribution:
        \[
          \log y_t \mid x_t \;\sim\; t_\nu(x_t,\, \sigma_o^2).
        \]
        This down-weights extreme observations (missed survey years,
        counting errors) and is a common robustification.  NumPyro's
        \texttt{dist.StudentT} makes this a one-line change.  Compare
        the inferred trajectory under Gaussian and Student-$t$
        observation models for a dataset with one clear outlier.

\end{enumerate}

%=============================================================
\section{Diagnostics}
%=============================================================

\subsection{Standard MCMC diagnostics}

The same checklist from the unemployment lecture applies here.

\begin{enumerate}[itemsep=4pt]

  \item \textbf{Trace plots and $\hat R$.}  For a $T = 40$ dataset
        the parameter vector is low-dimensional; convergence should
        be fast.  The latent states $x_{1:T}$ are also part of the
        posterior and their $\hat R$ values should be checked.

  \item \textbf{Effective sample size.}  ESS can be low for
        $\sigma_p$ and $\sigma_o$ individually (due to the
        identification problem) even when their ratio is well
        identified.

  \item \textbf{Divergences.}  NUTS reports the number of divergent
        transitions.  Divergences in this model often arise near
        $\sigma_p \approx 0$, where the geometry of the posterior
        becomes degenerate (a ``funnel'' in the $(\sigma_p, x_{1:T})$
        space).  The solution is a \emph{non-centred
        parameterisation}.

\end{enumerate}

\subsection{The funnel problem and non-centred parameterisation}

Hierarchical models — and state space models in particular — suffer
from a notorious sampling problem known as Neal's funnel
\citep{Neal2003}.  When $\sigma_p$ is small, the latent states
$x_{1:T}$ are tightly constrained around a smooth path; the posterior
geometry in the $(\sigma_p, x_{1:T})$ space looks like a funnel, with
a wide mouth at large $\sigma_p$ and a narrow neck near $\sigma_p =
0$.  NUTS struggles to explore both regions simultaneously.

The standard remedy is to \emph{non-centre} the latent states.
Instead of sampling $x_t$ directly, sample standardised innovations
$\tilde\eta_t \sim \mathcal{N}(0,1)$ and reconstruct:
\[
  x_t \;=\; x_{t-1} + r(1 - e^{x_{t-1}-\kappa}) + \sigma_p \tilde\eta_t.
\]
Now $\sigma_p$ and $\tilde\eta_{1:T}$ are a priori independent, and
the funnel geometry disappears.  The non-centred parameterisation is
the correct default for state space models in NumPyro.

\begin{verbatim}
# Non-centred version
eta = numpyro.sample("eta", dist.Normal(jnp.zeros(T-1),
                                         jnp.ones(T-1)))
# Then reconstruct x_t = f(x_{t-1}) + sigma_p * eta[t]
\end{verbatim}

This is an important lesson: model parameterisation affects sampler
performance, and the ``natural'' parameterisation is not always the
best one for MCMC.

\subsection{Posterior predictive checks}

Generate replicated datasets $\tilde y_{1:T}^{(s)}$ for $s = 1,
\ldots, S$ posterior draws.  Compare the following statistics between
observed and replicated data:

\begin{itemize}[itemsep=4pt]
  \item Maximum population count (is the observed peak within the
        posterior predictive range?)
  \item Time to first reach $K/2$ (does the model replicate the
        speed of early growth?)
  \item Lag-1 autocorrelation of log counts (does the model capture
        temporal dependence?)
  \item Fraction of years with population decline (captures
        asymmetry in dynamics)
\end{itemize}

For the Canadian lynx series, the posterior predictive distribution
from the logistic model will \emph{not} replicate the strong
oscillations in the data.  This is a clean illustration of the
purpose of predictive checks: the model passes naive goodness-of-fit
tests (it fits the data reasonably well) but fails the predictive
check that matters for the underlying dynamics.

%=============================================================
\section{Connection to the Fisheries Surplus Production Model}
%=============================================================

The logistic growth state space model \eqref{eq:pop_transition}--\eqref{eq:pop_observation}
becomes a Schaefer surplus production model for fisheries simply by
adding a catch term to the transition equation:

\begin{equation}
  \log N_t \;=\; \log N_{t-1}
              + r\!\left(1 - \frac{N_{t-1}}{K}\right)
              - \frac{C_{t-1}}{N_{t-1}}
              + \eta_t,
  \label{eq:spm}
\end{equation}

where $C_{t-1}$ is the observed catch (in the same units as $N_t$).
The observation equation remains the same, now interpreted as: CPUE
(catch per unit effort) is a noisy index of true biomass.

Every concept developed in this lecture transfers directly:
\begin{itemize}[itemsep=4pt]
  \item The latent state $N_t$ is fish biomass, never directly observed
  \item The signal-to-noise problem is more acute (surveys are sparse
        and expensive)
  \item The $r$--$K$ identification problem becomes the $r$--$K$--$q$
        problem, where $q$ is catchability (the proportionality constant
        between CPUE and biomass)
  \item Neal's funnel reappears and non-centred parameterisation is
        essential
  \item Posterior predictive checks now have direct management
        implications: does the model predict the stock will recover
        under a proposed catch limit?
\end{itemize}

The fisheries lecture adds the policy layer: given the posterior over
$(r, K, N_T)$, what harvest rule maximises expected sustainable yield
while keeping the probability of stock collapse below a target level?
This connects to the CVaR-based fisheries management research
discussed elsewhere in the QuantEcon sequence.

%=============================================================
\section{Suggested Lecture Structure}
%=============================================================

\begin{enumerate}[itemsep=6pt]

  \item \textbf{Motivation} (10 min).  Show a wolf population plot.
        Ask: what fraction of the variance is real dynamics versus
        counting error?  Explain why this matters for policy.
        Introduce the observation error vs process error distinction.

  \item \textbf{Linear Gaussian model and Kalman filter} (20 min).
        Write the local level model.  Derive the Kalman gain
        intuitively.  Show how the filtered estimate depends on
        $q = \sigma_p^2/\sigma_o^2$.  Plot filtered trajectories for
        high-$q$ and low-$q$ cases.

  \item \textbf{NUTS on the linear model} (15 min).  Run NumPyro on
        the local level model.  Show that posterior means of $x_{1:T}$
        match the Kalman smoother.  This validates the sampler.

  \item \textbf{Nonlinear extension} (20 min).  Introduce the
        logistic growth transition.  Explain why Kalman fails.
        Show that the NumPyro code changes by two lines (add $r$ and
        $K$, modify the drift).  Run NUTS.

  \item \textbf{Posterior analysis} (20 min).  Marginal posteriors
        of $(r, K, \sigma_p, \sigma_o)$.  Latent trajectory with
        credible bands.  $r$--$K$ pair plot showing correlation ridge.
        Posterior predictive checks.

  \item \textbf{The funnel and non-centred parameterisation} (10 min).
        Show divergences in the centred model.  Switch to
        non-centred.  Show they disappear.  Explain why this matters.

  \item \textbf{Outlook: fisheries} (5 min).  Add a catch term to
        the transition equation.  Point to the companion lecture.

\end{enumerate}

%=============================================================
\bibliographystyle{plainnat}
\begin{thebibliography}{99}

\bibitem[Dennis et al.(2006)]{Dennis2006}
Dennis, B., Ponciano, J.M., Lele, S.R., Taper, M.L., and Staples, D.F.\
  (2006).
\newblock Estimating density dependence, process noise, and observation error.
\newblock \textit{Ecological Monographs}, 76(3):323--341.

\bibitem[de Valpine and Hastings(2002)]{deValpine2002}
de Valpine, P.\ and Hastings, A.\ (2002).
\newblock Fitting population models incorporating process noise and observation
  error.
\newblock \textit{Ecological Monographs}, 72(1):57--76.

\bibitem[Knape and de Valpine(2012)]{Knape2012}
Knape, J.\ and de Valpine, P.\ (2012).
\newblock Are patterns of density dependence in the Global Population Dynamics
  Database driven by uncertainty about population abundance?
\newblock \textit{Ecology Letters}, 15(1):17--23.

\bibitem[Meyer and Millar(1999)]{MeyerMillar1999}
Meyer, R.\ and Millar, R.B.\ (1999).
\newblock BUGS in Bayesian stock assessments.
\newblock \textit{Canadian Journal of Fisheries and Aquatic Sciences},
  56(6):1078--1087.

\bibitem[Neal(2003)]{Neal2003}
Neal, R.M.\ (2003).
\newblock Slice sampling.
\newblock \textit{Annals of Statistics}, 31(3):705--767.
\newblock (The ``funnel'' example appears in the discussion of hierarchical
  models.)

\bibitem[Papaspiliopoulos et al.(2007)]{Papa2007}
Papaspiliopoulos, O., Roberts, G.O., and Sk{\"o}ld, M.\ (2007).
\newblock A general framework for the parametrization of hierarchical models.
\newblock \textit{Statistical Science}, 22(1):59--73.

\bibitem[Punt et al.(2003)]{Punt2003}
Punt, A.E., Smith, A.D.M., and Cui, G.\ (2003).
\newblock In pursuit of a robust harvest rule for Pacific sardine
  (\textit{Sardinops sagax}).
\newblock \textit{Fisheries Oceanography}, 12(4--5):379--392.

\bibitem[Schaefer(1954)]{Schaefer1954}
Schaefer, M.B.\ (1954).
\newblock Some aspects of the dynamics of populations important to the
  management of commercial marine fisheries.
\newblock \textit{Bulletin of the Inter-American Tropical Tuna Commission},
  1(2):27--56.

\bibitem[Winker et al.(2018)]{Winker2018}
Winker, H., Carvalho, F., and Kapur, M.\ (2018).
\newblock JABBA: Just Another Bayesian Biomass Assessment.
\newblock \textit{Fisheries Research}, 204:275--288.

\bibitem[Hoffman and Gelman(2014)]{HoffmanGelman2014}
Hoffman, M.D.\ and Gelman, A.\ (2014).
\newblock The No-U-Turn Sampler: Adaptively setting path lengths in
  Hamiltonian Monte Carlo.
\newblock \textit{Journal of Machine Learning Research}, 15:1593--1623.

\end{thebibliography}

\end{document}

```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[population_ssm] New lecture: Bayesian state-space models (linear filtering to nonlinear population dynamics) #911

Goal

Origin

Review findings (settled)

Agreed design

Section plan

Tasks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[population_ssm] New lecture: Bayesian state-space models (linear filtering to nonlinear population dynamics) #911

Description

Goal

Origin

Review findings (settled)

Agreed design

Section plan

Tasks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions