(Embedded for self-containment. John may also drag-drop the converted population_ssm_lecture.md onto this issue.)
\documentclass[12pt]{article}
\usepackage{amsmath, amssymb, amsthm}
\usepackage{geometry}
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{enumitem}
\usepackage{xcolor}
\usepackage{natbib}
\geometry{margin=1.2in}
\hypersetup{
colorlinks=true,
linkcolor=blue!60!black,
citecolor=blue!60!black,
urlcolor=blue!60!black
}
\newtheorem{remark}{Remark}
\newtheorem{example}{Example}
\title{\textbf{Bayesian State Space Models: From Linear Filtering to
Nonlinear Population Dynamics}\\
\large{Notes Toward a QuantEcon Lecture}}
\author{John Stachurski}
\date{\today}
\begin{document}
\maketitle
\tableofcontents
\bigskip
%=============================================================
\section{Introduction and Motivation}
%=============================================================
\subsection{Overview}
This lecture introduces Bayesian estimation of state space models,
progressing from a simple linear Gaussian specification — where exact
analytical results are available — to a nonlinear, non-Gaussian model
of animal population dynamics. The nonlinear model is a logistic
growth state space system: essentially a single-species surplus
production model without harvesting. It is a direct on-ramp to the
Bayesian fisheries stock assessment methods developed in the companion
lecture.
The lecture has three interlocking goals. The first is conceptual:
to establish the distinction between \emph{observation error} and
\emph{process error}, and to show how a state space model separates
the two. The second is analytical: to derive the Kalman filter for
the linear Gaussian case and understand what it computes. The third
is computational: to implement the same model in NumPyro, verify that
NUTS recovers the Kalman solution, and then extend to the nonlinear
case where the Kalman filter is no longer exact.
\subsection{Why animal population dynamics?}
Population ecology provides an ideal setting for introducing state
space models to an economics audience.
\begin{itemize}[itemsep=4pt]
\item \textbf{The latent state is genuinely unobserved.} We never
count every animal in a population. Survey data (aerial
counts, camera traps, mark-recapture experiments) are noisy
proxies for the true population size. The latent state
interpretation is not a modelling convenience — it is the
literal situation.
\item \textbf{The dynamics are economically familiar.} Logistic
growth is the simplest model of a renewable resource with a
carrying capacity. It is the biological analogue of
diminishing returns, and it underpins the entire literature on
optimal harvesting and fisheries management.
\item \textbf{The signal-to-noise problem is vivid.} Wildlife
surveys are expensive and infrequent; observation errors are
large. Students immediately grasp why it matters to
distinguish a population that is genuinely declining from one
that merely \emph{appears} to be declining because of noisy
counts.
\item \textbf{Real data are short and clean.} Annual wolf, elk, or
lynx count series from national parks typically have 30--60
observations, which is ideal for a notebook: NUTS runs fast,
the model is identified, and the posterior is interpretable.
\end{itemize}
\subsection{Position in the lecture sequence}
This lecture sits between the QuantEcon lectures on the Kalman filter
(linear Gaussian theory) and the fisheries surplus production lecture
(nonlinear state space with harvesting and a policy layer). It
serves two bridging functions. First, it shows students that the
Kalman filter is a special case of Bayesian filtering, recoverable
from NUTS as a limiting case. Second, it introduces the logistic
growth process that reappears — with a catch term added — as the
Schaefer surplus production model in the fisheries lecture. The
computational tools (NumPyro model, NUTS, ArviZ diagnostics) are
identical across lectures, so the marginal cost of learning the
fisheries model after this one is low.
%=============================================================
\section{State Space Models: The Basic Framework}
%=============================================================
\subsection{Two sources of uncertainty}
A central theme of this lecture is that time series data conflate two
distinct sources of randomness that have very different implications
for inference.
\textbf{Process error} (also called system noise or ecological
stochasticity in biology) reflects genuine randomness in the
underlying dynamics: random births and deaths, environmental
fluctuations that affect reproduction, demographic accidents in small
populations. Process error means the true state of the system is
uncertain even to the system itself; the uncertainty accumulates over
time and cannot be reduced by better observation.
\textbf{Observation error} (also called measurement noise) reflects
imperfections in how we observe the state: sampling variance in
surveys, detection probability less than one, counting errors.
Observation error is in principle reducible by investing in better
data collection; it does not reflect genuine uncertainty about future
dynamics.
Conflating the two leads to badly calibrated forecasts and policy
advice. A model that attributes observation noise to process
variability will overestimate how unpredictable the system is. A
model that treats genuine process noise as a measurement error will
overstate how precisely we can predict the future. The state space
framework keeps them separate by construction.
\subsection{General state space form}
A state space model has two equations. The \emph{transition
equation} (or process equation) describes how the latent state
$x_t$ evolves:
\begin{equation}
x_t = f(x_{t-1},\, \eta_t), \qquad \eta_t \sim p_\eta,
\label{eq:transition}
\end{equation}
where $f$ is a (possibly nonlinear) function and $\eta_t$ is process
noise. The \emph{observation equation} (or measurement equation)
describes how the observed data $y_t$ relate to the latent state:
\begin{equation}
y_t = g(x_t,\, \epsilon_t), \qquad \epsilon_t \sim p_\epsilon,
\label{eq:observation}
\end{equation}
where $g$ is a (possibly nonlinear) function and $\epsilon_t$ is
observation noise. The latent states $x_{1:T} = (x_1,\ldots,x_T)$
are never directly observed.
\paragraph{What Bayesian inference does.} Given observed data
$y_{1:T}$ and a prior over parameters $\theta$, we want the joint
posterior:
\[
p(\theta,\, x_{1:T} \mid y_{1:T})
\;\propto\;
p(y_{1:T} \mid x_{1:T}, \theta)\;
p(x_{1:T} \mid \theta)\;
p(\theta).
\]
This is a joint distribution over both the parameters and the
entire latent state trajectory. NUTS samples from this joint
posterior directly, treating $x_{1:T}$ as a high-dimensional
latent variable alongside $\theta$.
%=============================================================
\section{The Linear Gaussian Case}
%=============================================================
\subsection{Model specification}
The simplest state space model is the \emph{local level model}:
\begin{align}
x_t &= x_{t-1} + \eta_t, \qquad \eta_t \sim \mathcal{N}(0,\,\sigma_p^2)
\label{eq:ll_transition} \\[4pt]
y_t &= x_t + \epsilon_t, \qquad \epsilon_t \sim \mathcal{N}(0,\,\sigma_o^2).
\label{eq:ll_observation}
\end{align}
The latent state $x_t$ is a random walk; the observed series $y_t$
is a noisy measurement of $x_t$. Parameters are
$\theta = (\sigma_p, \sigma_o, x_0)$ where $x_0$ is the initial
state (or its prior mean).
For the population dynamics application, take $x_t = \log N_t$ where
$N_t$ is population size, and $y_t = \log \hat N_t$ where $\hat N_t$
is the survey count. Then the local level model says:
\begin{itemize}
\item The log-population drifts randomly (geometric random walk);
\item Survey counts are a log-normal perturbation of the true
population.
\end{itemize}
This is the right starting point — it captures the log-normal
observation structure without the complications of density-dependent
dynamics.
\subsection{The signal-to-noise ratio}
Define the \emph{signal-to-noise ratio}
\begin{equation}
q \;=\; \frac{\sigma_p^2}{\sigma_o^2}.
\label{eq:snr}
\end{equation}
This single parameter determines the character of the filtered
estimate.
\begin{itemize}[itemsep=4pt]
\item When $q \to 0$: process noise is negligible relative to
observation noise. The filter heavily smooths the data,
placing most weight on the prior trajectory. The filtered
estimate is a very smooth trend.
\item When $q \to \infty$: observation noise is negligible. The
filter trusts each new observation almost completely. The
filtered estimate tracks the data closely.
\item For intermediate $q$: the filter strikes a balance, giving a
smoothed estimate that neither ignores the data nor tracks it
slavishly.
\end{itemize}
In population ecology, $q$ is typically small: surveys are noisy and
genuine year-to-year fluctuations in abundance are moderate. One of
the most important lessons of the lecture is that $q$ is often
\emph{weakly identified}: data can be consistent with a wide range of
$(\sigma_p, \sigma_o)$ pairs having the same total variance
$\sigma_p^2 + \sigma_o^2$. The posterior of $q$ will often be wide,
and the posterior of the latent trajectory will respond
correspondingly to the assumed $q$.
\subsection{The Kalman filter}
For the linear Gaussian model \eqref{eq:ll_transition}--\eqref{eq:ll_observation},
the posterior of $x_t$ given $y_{1:t}$ is Gaussian at every $t$.
The Kalman filter propagates the mean $m_t = \mathbb{E}[x_t \mid
y_{1:t}]$ and variance $P_t = \mathrm{Var}(x_t \mid y_{1:t})$
recursively.
\paragraph{Predict step.}
\begin{align}
m_{t|t-1} &= m_{t-1} \label{eq:kf_predict_mean} \\
P_{t|t-1} &= P_{t-1} + \sigma_p^2. \label{eq:kf_predict_var}
\end{align}
\paragraph{Update step.}
\begin{align}
K_t &= \frac{P_{t|t-1}}{P_{t|t-1} + \sigma_o^2}
\label{eq:kf_gain} \\[4pt]
m_t &= m_{t|t-1} + K_t (y_t - m_{t|t-1})
\label{eq:kf_update_mean} \\[4pt]
P_t &= (1 - K_t)\, P_{t|t-1}.
\label{eq:kf_update_var}
\end{align}
The scalar $K_t \in (0,1)$ is the \emph{Kalman gain}. It equals
$\sigma_p^2 / (\sigma_p^2 + \sigma_o^2)$ at steady state, which is
exactly the signal-to-noise ratio rescaled to $(0,1)$. When $q$ is
large, $K_t \approx 1$ and the filter trusts the new observation.
When $q$ is small, $K_t \approx 0$ and the filter discounts it.
\paragraph{Marginal likelihood.}
The Kalman filter also produces the \emph{innovation}
$\nu_t = y_t - m_{t|t-1}$ and its variance $S_t = P_{t|t-1} +
\sigma_o^2$. The log marginal likelihood (integrating out $x_{1:T}$)
is:
\begin{equation}
\log p(y_{1:T} \mid \theta)
\;=\;
-\frac{1}{2}\sum_{t=1}^{T}
\left[\log(2\pi S_t) + \frac{\nu_t^2}{S_t}\right].
\label{eq:kf_loglik}
\end{equation}
This is differentiable in $\theta = (\sigma_p, \sigma_o)$, so maximum
likelihood estimation is straightforward.
\subsection{Recovering the Kalman solution with NUTS}
Before introducing the nonlinear model, it is important to verify
that NUTS gives the same answer as the Kalman filter on the linear
model. This serves two purposes: it validates the NumPyro
implementation, and it builds student trust in the sampler before
we move to settings where analytical benchmarks are unavailable.
The NumPyro model for the local level specification treats $x_{1:T}$
as explicit latent variables:
\begin{verbatim}
import numpyro
import numpyro.distributions as dist
import jax.numpy as jnp
def local_level(y):
T = len(y)
sigma_p = numpyro.sample("sigma_p", dist.HalfNormal(1.0))
sigma_o = numpyro.sample("sigma_o", dist.HalfNormal(1.0))
x0 = numpyro.sample("x0", dist.Normal(y[0], 2.0))
x = [x0]
for t in range(1, T):
x_t = numpyro.sample(
f"x_{t}",
dist.Normal(x[t-1], sigma_p)
)
x.append(x_t)
x_arr = jnp.stack(x)
with numpyro.plate("obs", T):
numpyro.sample("y", dist.Normal(x_arr, sigma_o), obs=y)
\end{verbatim}
\begin{remark}
The loop-based formulation above is pedagogically clear but
computationally inefficient for long series because JAX traces
through each iteration. For production code, use
\texttt{numpyro.contrib.control\_flow.scan} to vectorise the
recursion. For a short dataset (say $T = 50$) the loop is fast
enough for a lecture notebook.
\end{remark}
The posterior marginal for $x_t$ from NUTS should match the Kalman
smoother distribution (not just the filter) since NUTS conditions on
the full data $y_{1:T}$. Plot both on the same axis as a
verification exercise.
%=============================================================
\section{The Nonlinear Model: Logistic Population Growth}
%=============================================================
\subsection{Model specification}
The local level model assumes the population follows a geometric
random walk, which allows unbounded growth and ignores density
dependence. A more realistic description introduces a carrying
capacity $K$: as the population approaches $K$, growth slows and
eventually reverses. The standard continuous-time model is the
logistic equation; its discrete-time analogue in log-space is:
\begin{align}
\log N_t &= \log N_{t-1}
+ r\!\left(1 - \frac{N_{t-1}}{K}\right)
+ \eta_t,
\qquad \eta_t \sim \mathcal{N}(0,\,\sigma_p^2)
\label{eq:pop_transition} \\[6pt]
\log y_t &= \log N_t + \epsilon_t,
\qquad \epsilon_t \sim \mathcal{N}(0,\,\sigma_o^2).
\label{eq:pop_observation}
\end{align}
Here $N_t > 0$ is the true (latent) population at time $t$; $y_t > 0$
is the observed survey count; $r > 0$ is the intrinsic growth rate;
$K > 0$ is the carrying capacity; $\sigma_p$ and $\sigma_o$ are the
process and observation noise standard deviations.
\paragraph{Interpretation of equation~\eqref{eq:pop_transition}.}
When $N_{t-1} \ll K$, the term $(1 - N_{t-1}/K) \approx 1$ and the
population grows at approximately rate $r$ per period. When
$N_{t-1} = K$, growth is exactly zero on average. When $N_{t-1} >
K$, the growth rate is negative, pulling the population back down.
The process noise $\eta_t$ captures environmental stochasticity —
random variation in birth rates, survival, and habitat quality.
\paragraph{Interpretation of equation~\eqref{eq:pop_observation}.}
Survey counts $y_t$ are log-normally distributed around the true
population $N_t$. Equivalently, $y_t = N_t \cdot e^{\epsilon_t}$,
so the observation error is multiplicative — a common model for count
data where variance scales with mean. The standard deviation
$\sigma_o$ on the log scale controls the coefficient of variation of
the survey counts.
\subsection{Why the Kalman filter fails here}
The Kalman filter requires both the transition and observation
equations to be \emph{linear} in the state.
Equation~\eqref{eq:pop_transition} is nonlinear: the term $r
N_{t-1}/K$ makes $\log N_t$ depend nonlinearly on $N_{t-1} =
\exp(\log N_{t-1})$. Specifically, if we let $x_t = \log N_t$, the
transition becomes
\[
x_t \;=\; x_{t-1} + r\!\left(1 - \frac{e^{x_{t-1}}}{K}\right) + \eta_t,
\]
which is not affine in $x_{t-1}$. The Extended Kalman Filter (EKF)
linearises this around the current estimate, but the approximation
can be poor when the population is far from the carrying capacity or
when $\sigma_p$ is large. The Unscented Kalman Filter (UKF) does
better but still imposes Gaussian approximations. NUTS treats the
nonlinearity exactly, making it the right tool once we leave the
linear Gaussian world.
\subsection{Reparameterisation}
It is convenient to work with log-transformed parameters and
states throughout. Define:
\[
x_t = \log N_t, \qquad
\kappa = \log K.
\]
The model becomes:
\begin{align}
x_t &= x_{t-1}
+ r\!\left(1 - e^{x_{t-1} - \kappa}\right)
+ \eta_t
\label{eq:logpop_transition} \\[6pt]
\log y_t &= x_t + \epsilon_t.
\label{eq:logpop_observation}
\end{align}
This formulation has several computational advantages: $x_t$ ranges
over all of $\mathbb{R}$, the parameters $r$ and $e^\kappa = K$ are
unconstrained after appropriate transformation, and the observation
equation is exactly linear in $x_t$. NUTS operates on the
unconstrained space, and NumPyro's \texttt{dist.TransformedDistribution}
handles the Jacobian automatically when sampling log-scale parameters.
\subsection{Parameters and their roles}
\begin{center}
\begin{tabular}{llll}
\toprule
Parameter & Symbol & Units & Typical range \\
\midrule
Intrinsic growth rate & $r$ & per period & $(0.05,\, 0.5)$ \\
Carrying capacity & $K$ & individuals & application-specific \\
Process noise & $\sigma_p$ & log-scale & $(0.02,\, 0.3)$ \\
Observation noise & $\sigma_o$ & log-scale & $(0.05,\, 0.5)$ \\
Initial log-population & $x_0$ & log-scale & $\approx \log K$ \\
\bottomrule
\end{tabular}
\end{center}
The signal-to-noise ratio $q = \sigma_p^2 / \sigma_o^2$ plays the
same central role as in the local level model, now compounded by the
additional identification challenge between $r$ and $K$: only the
equilibrium $K$ and the speed of adjustment $r$ together determine
the long-run dynamics, and the two interact in a way that makes their
joint posterior correlated.
%=============================================================
\section{Bayesian Formulation}
%=============================================================
\subsection{Likelihood}
Conditional on parameters $\theta = (r, K, \sigma_p, \sigma_o, x_0)$
and latent states $x_{1:T}$, the observations $\log y_t$ are
independent:
\[
\log y_t \mid x_t,\,\theta
\;\sim\;
\mathcal{N}(x_t,\; \sigma_o^2).
\]
The latent states evolve according to:
\[
x_t \mid x_{t-1},\,\theta
\;\sim\;
\mathcal{N}\!\left(
x_{t-1} + r\!\left(1 - e^{x_{t-1}-\kappa}\right),\;
\sigma_p^2
\right).
\]
The joint log-likelihood of the latent path and observations is:
\begin{equation}
\log p(y_{1:T}, x_{1:T} \mid \theta)
\;=\;
-\frac{T}{2}\log(2\pi\sigma_o^2)
-\frac{1}{2\sigma_o^2}\sum_{t=1}^T (\log y_t - x_t)^2
-\frac{T}{2}\log(2\pi\sigma_p^2)
-\frac{1}{2\sigma_p^2}\sum_{t=1}^T \bigl[x_t - x_{t-1}
- r(1-e^{x_{t-1}-\kappa})\bigr]^2.
\end{equation}
NUTS samples from $p(\theta, x_{1:T} \mid y_{1:T}) \propto
p(y_{1:T}, x_{1:T} \mid \theta) \cdot p(\theta)$ jointly.
\subsection{Prior distributions}
\begin{align}
r &\;\sim\; \mathrm{HalfNormal}(0.3)
\tag{P1}\label{prior:r} \\[4pt]
K &\;\sim\; \mathrm{LogNormal}(\mu_K,\, 0.5^2)
\tag{P2}\label{prior:K} \\[4pt]
\sigma_p &\;\sim\; \mathrm{HalfNormal}(0.2)
\tag{P3}\label{prior:sigmap} \\[4pt]
\sigma_o &\;\sim\; \mathrm{HalfNormal}(0.3)
\tag{P4}\label{prior:sigmao} \\[4pt]
x_0 &\;\sim\; \mathcal{N}(\log \hat{N}_1,\; 1.0^2)
\tag{P5}\label{prior:x0}
\end{align}
\paragraph{Notes on prior choices.}
\begin{itemize}[itemsep=4pt]
\item \textbf{Growth rate \eqref{prior:r}.} A $\mathrm{HalfNormal}(0.3)$
prior places most mass on $r \in (0, 0.6)$. For large
mammals (wolves, elk), typical intrinsic growth rates are
$r \in (0.1, 0.4)$ per year. For fish stocks $r$ can be
larger. The prior should be adjusted to the application;
life-history databases such as FishBase or the RAM Legacy
Database provide useful reference points.
\item \textbf{Carrying capacity \eqref{prior:K}.} The
log-Normal prior is natural for a positive quantity. Set
$\mu_K$ to the log of a rough prior estimate of maximum
sustainable population, derived from survey history or
habitat assessments. The scale $0.5$ on the log scale
corresponds to a factor-of-1.6 uncertainty, which is
deliberately wide.
\item \textbf{Process and observation noise
\eqref{prior:sigmap}--\eqref{prior:sigmao}.}
Both are on the log scale. A $\mathrm{HalfNormal}(0.2)$ prior
on $\sigma_p$ assigns most mass to year-to-year log-scale
fluctuations below about $40\%$, which is appropriate for
large mammals. The key prior sensitivity exercise is to vary
the ratio of these scales and show how the filtered trajectory
responds.
\item \textbf{Initial state \eqref{prior:x0}.} Centred on the
log of the first observation, with a standard deviation of
1.0 on the log scale. This is intentionally vague — the
first survey count may be far from the true initial
population.
\end{itemize}
\subsection{NumPyro implementation}
\begin{verbatim}
import numpyro
import numpyro.distributions as dist
import jax.numpy as jnp
from jax import lax
def logistic_ssm(log_y):
T = len(log_y)
# Priors (all unconstrained or half-constrained)
r = numpyro.sample("r", dist.HalfNormal(0.3))
log_K = numpyro.sample("log_K", dist.Normal(jnp.mean(log_y), 1.0))
sigma_p = numpyro.sample("sigma_p", dist.HalfNormal(0.2))
sigma_o = numpyro.sample("sigma_o", dist.HalfNormal(0.3))
x0 = numpyro.sample("x0", dist.Normal(log_y[0], 1.0))
# Latent state trajectory via scan
def transition(x_prev, t):
drift = r * (1.0 - jnp.exp(x_prev - log_K))
x_t = numpyro.sample(
f"x_{t}",
dist.Normal(x_prev + drift, sigma_p)
)
return x_t, x_t
_, x_vals = lax.scan(
lambda carry, t: transition(carry, t),
x0,
jnp.arange(1, T)
)
x_all = jnp.concatenate([x0[None], x_vals])
# Observation model
with numpyro.plate("obs", T):
numpyro.sample("y", dist.Normal(x_all, sigma_o), obs=log_y)
\end{verbatim}
\begin{remark}
The use of \texttt{lax.scan} replaces the Python loop from the
local level implementation. This is important for performance:
\texttt{lax.scan} compiles the recursion into a single XLA
operation, making it fast even for $T = 200$. The tradeoff is
that NumPyro's named-sample approach inside \texttt{scan} requires
care — consult the NumPyro documentation on scan and
\texttt{numpyro.contrib.control\_flow} for production-quality
implementations.
\end{remark}
%=============================================================
\section{Data}
%=============================================================
\subsection{Yellowstone wolves}
The canonical dataset for this lecture is the \textbf{Yellowstone
wolf population}, monitored continuously since the reintroduction of
17 wolves in 1995. Annual population estimates are published by the
Yellowstone Wolf Project (National Park Service) and are freely
available. The series runs from 1995 to the present, giving
approximately 30 observations, and shows textbook logistic growth:
rapid early expansion from 17 to a peak of 174 in 2003, followed by
stabilisation and modest decline.
\begin{itemize}
\item \textbf{Source:}
\url{https://www.nps.gov/yell/learn/nature/wolves.htm}
\item \textbf{Format:} Annual total population count. Copy the
table into a CSV; it is short enough to enter by hand.
\item \textbf{Notes:} The counts are intensive survey estimates,
not raw observations. The observation error $\sigma_o$ therefore
reflects survey methodology rather than raw counting variance.
\end{itemize}
This dataset is ideal because: (a) the logistic growth story is
visually obvious in the data; (b) the carrying capacity $K$ is
meaningfully estimable — the population clearly levelled off; (c) the
series is short enough that NUTS runs in seconds; and (d) it is a
genuinely interesting ecological and policy story.
\subsection{Alternative datasets}
\begin{center}
\begin{tabular}{lllp{5.5cm}}
\toprule
Population & Source & $T$ & Notes \\
\midrule
Yellowstone elk & NPS Yellowstone & $\sim$40 & Predator--prey context; longer series \\
Canadian lynx & MacKenzie (1820--1934) & 114 & Classic ecological time series; shows cycles \\
Whooping crane & USFWS & $\sim$80 & Critically endangered; strong conservation interest \\
Harbour seal (UK) & SCOS reports & $\sim$40 & Direct fisheries relevance \\
Pacific salmon run & NOAA / PSA & $\sim$60 & Direct link to fisheries lecture \\
\bottomrule
\end{tabular}
\end{center}
The \textbf{Canadian lynx} series is particularly interesting for
a lecture on nonlinear dynamics because the population shows
predator--prey oscillations that a simple logistic model cannot
capture. This makes it a useful example of \emph{model
misspecification} — the posterior predictive checks will fail
visibly, motivating extensions.
\subsection{Simulated data as a pedagogical tool}
Before fitting to real data, it is valuable to fit the model to
data simulated from known parameters. This \emph{simulation study}
accomplishes three things:
\begin{enumerate}
\item It verifies that the model is identifiable: do the posterior
medians recover the true parameters?
\item It reveals which parameters are well-identified and which are
weakly identified (wide posteriors even with infinite data
would signal structural non-identification).
\item It gives students intuition for what the model produces
before they see real data.
\end{enumerate}
A typical simulation exercise: fix $(r, K, \sigma_p, \sigma_o) =
(0.2, 150, 0.1, 0.2)$ and $N_0 = 20$. Simulate $T = 40$ years of
data. Estimate the model. Report the $90\%$ credible intervals for
each parameter and check whether the true value is contained.
%=============================================================
\section{Interesting Questions and Exercises}
%=============================================================
\subsection{Substantive questions}
\begin{enumerate}[itemsep=6pt]
\item \textbf{What is the carrying capacity?} The posterior of $K$
gives a full probability distribution over the maximum
sustainable population. For Yellowstone wolves, this has
direct management implications: how many wolves can
Yellowstone sustain? Compare the posterior to published
habitat-based estimates.
\item \textbf{What is the intrinsic growth rate?} The posterior
of $r$ answers this. For Yellowstone wolves, published
estimates suggest $r \approx 0.25$ per year in the early
reintroduction period. Does the posterior recover this?
\item \textbf{What is the true population trajectory?} The
posterior of $x_{1:T}$ (equivalently, $N_{1:T}$) gives a
probability distribution over the latent population path.
Plot the posterior median and $90\%$ credible band alongside
the observed counts. Years with few survey visits will have
wide bands; years with intensive surveying will have narrow
bands.
\item \textbf{Is the population currently below or above carrying
capacity?} Compute $P(N_T < K \mid y_{1:T})$ from the
posterior. This is the kind of probabilistic statement
that managers need and that a point estimate cannot provide.
\item \textbf{How much of the variance is observation error?} The
ratio $\sigma_o^2 / (\sigma_p^2 + \sigma_o^2)$ is the
fraction of total variance attributable to measurement.
Plot its posterior distribution. A value near 1 means most
of what we see in the data is noise; a value near 0 means the
observed fluctuations are genuine.
\end{enumerate}
\subsection{Teaching Bayesian foundations}
\begin{enumerate}[itemsep=6pt]
\item \textbf{The signal-to-noise identification problem.}
This is the central pedagogical point of the lecture. Fix
the total variance $\sigma_p^2 + \sigma_o^2$ and vary $q =
\sigma_p^2/\sigma_o^2$. Show how the inferred latent
trajectory changes: high $q$ gives a wiggly trajectory that
tracks observations closely; low $q$ gives a smooth trajectory
that largely ignores observation-level fluctuations. Then
show that the posterior of $q$ is wide — the data often
cannot sharply determine how much of the variance is process
versus observation. This is not a failure of the model; it
is an honest representation of what the data can and cannot
tell us.
\item \textbf{Prior predictive simulation.} Before looking at
data, sample $(r, K, \sigma_p, \sigma_o)$ from the prior and
simulate population trajectories. Are the prior trajectories
ecologically plausible? Do they include populations that
go extinct in the first year, or grow to a billion? Prior
predictive simulation is the right way to diagnose
unreasonable priors before fitting.
\item \textbf{Marginalising versus conditioning on latent states.}
Advanced students can compare two approaches: (a) NUTS
sampling the joint posterior over $(\theta, x_{1:T})$, and
(b) using a particle Marginal Metropolis--Hastings (PMMH)
algorithm that marginalises over $x_{1:T}$ using a particle
filter, sampling only over $\theta$. Approach (b) is more
efficient in high dimensions but more complex to implement.
For $T \leq 50$ and the model above, approach (a) is
perfectly adequate.
\item \textbf{Identifiability and the $r$--$K$ trade-off.}
The equilibrium population is $N^* \approx K$ (ignoring
stochasticity). The speed of approach to equilibrium depends
on $r$. But if $T$ is short and the population never gets
close to $K$, then $K$ and $r$ are only weakly jointly
identified: a fast growth rate combined with a high carrying
capacity can look similar to a slow growth rate with a
moderate carrying capacity over a short time horizon. The
joint posterior of $(r, K)$ will show this as a correlation
ridge, analogous to the $(\beta, \lambda)$ ridge in the
unemployment lecture.
\item \textbf{Non-Gaussian extensions.} Replace the Gaussian
observation noise with a Student-$t$ distribution:
\[
\log y_t \mid x_t \;\sim\; t_\nu(x_t,\, \sigma_o^2).
\]
This down-weights extreme observations (missed survey years,
counting errors) and is a common robustification. NumPyro's
\texttt{dist.StudentT} makes this a one-line change. Compare
the inferred trajectory under Gaussian and Student-$t$
observation models for a dataset with one clear outlier.
\end{enumerate}
%=============================================================
\section{Diagnostics}
%=============================================================
\subsection{Standard MCMC diagnostics}
The same checklist from the unemployment lecture applies here.
\begin{enumerate}[itemsep=4pt]
\item \textbf{Trace plots and $\hat R$.} For a $T = 40$ dataset
the parameter vector is low-dimensional; convergence should
be fast. The latent states $x_{1:T}$ are also part of the
posterior and their $\hat R$ values should be checked.
\item \textbf{Effective sample size.} ESS can be low for
$\sigma_p$ and $\sigma_o$ individually (due to the
identification problem) even when their ratio is well
identified.
\item \textbf{Divergences.} NUTS reports the number of divergent
transitions. Divergences in this model often arise near
$\sigma_p \approx 0$, where the geometry of the posterior
becomes degenerate (a ``funnel'' in the $(\sigma_p, x_{1:T})$
space). The solution is a \emph{non-centred
parameterisation}.
\end{enumerate}
\subsection{The funnel problem and non-centred parameterisation}
Hierarchical models — and state space models in particular — suffer
from a notorious sampling problem known as Neal's funnel
\citep{Neal2003}. When $\sigma_p$ is small, the latent states
$x_{1:T}$ are tightly constrained around a smooth path; the posterior
geometry in the $(\sigma_p, x_{1:T})$ space looks like a funnel, with
a wide mouth at large $\sigma_p$ and a narrow neck near $\sigma_p =
0$. NUTS struggles to explore both regions simultaneously.
The standard remedy is to \emph{non-centre} the latent states.
Instead of sampling $x_t$ directly, sample standardised innovations
$\tilde\eta_t \sim \mathcal{N}(0,1)$ and reconstruct:
\[
x_t \;=\; x_{t-1} + r(1 - e^{x_{t-1}-\kappa}) + \sigma_p \tilde\eta_t.
\]
Now $\sigma_p$ and $\tilde\eta_{1:T}$ are a priori independent, and
the funnel geometry disappears. The non-centred parameterisation is
the correct default for state space models in NumPyro.
\begin{verbatim}
# Non-centred version
eta = numpyro.sample("eta", dist.Normal(jnp.zeros(T-1),
jnp.ones(T-1)))
# Then reconstruct x_t = f(x_{t-1}) + sigma_p * eta[t]
\end{verbatim}
This is an important lesson: model parameterisation affects sampler
performance, and the ``natural'' parameterisation is not always the
best one for MCMC.
\subsection{Posterior predictive checks}
Generate replicated datasets $\tilde y_{1:T}^{(s)}$ for $s = 1,
\ldots, S$ posterior draws. Compare the following statistics between
observed and replicated data:
\begin{itemize}[itemsep=4pt]
\item Maximum population count (is the observed peak within the
posterior predictive range?)
\item Time to first reach $K/2$ (does the model replicate the
speed of early growth?)
\item Lag-1 autocorrelation of log counts (does the model capture
temporal dependence?)
\item Fraction of years with population decline (captures
asymmetry in dynamics)
\end{itemize}
For the Canadian lynx series, the posterior predictive distribution
from the logistic model will \emph{not} replicate the strong
oscillations in the data. This is a clean illustration of the
purpose of predictive checks: the model passes naive goodness-of-fit
tests (it fits the data reasonably well) but fails the predictive
check that matters for the underlying dynamics.
%=============================================================
\section{Connection to the Fisheries Surplus Production Model}
%=============================================================
The logistic growth state space model \eqref{eq:pop_transition}--\eqref{eq:pop_observation}
becomes a Schaefer surplus production model for fisheries simply by
adding a catch term to the transition equation:
\begin{equation}
\log N_t \;=\; \log N_{t-1}
+ r\!\left(1 - \frac{N_{t-1}}{K}\right)
- \frac{C_{t-1}}{N_{t-1}}
+ \eta_t,
\label{eq:spm}
\end{equation}
where $C_{t-1}$ is the observed catch (in the same units as $N_t$).
The observation equation remains the same, now interpreted as: CPUE
(catch per unit effort) is a noisy index of true biomass.
Every concept developed in this lecture transfers directly:
\begin{itemize}[itemsep=4pt]
\item The latent state $N_t$ is fish biomass, never directly observed
\item The signal-to-noise problem is more acute (surveys are sparse
and expensive)
\item The $r$--$K$ identification problem becomes the $r$--$K$--$q$
problem, where $q$ is catchability (the proportionality constant
between CPUE and biomass)
\item Neal's funnel reappears and non-centred parameterisation is
essential
\item Posterior predictive checks now have direct management
implications: does the model predict the stock will recover
under a proposed catch limit?
\end{itemize}
The fisheries lecture adds the policy layer: given the posterior over
$(r, K, N_T)$, what harvest rule maximises expected sustainable yield
while keeping the probability of stock collapse below a target level?
This connects to the CVaR-based fisheries management research
discussed elsewhere in the QuantEcon sequence.
%=============================================================
\section{Suggested Lecture Structure}
%=============================================================
\begin{enumerate}[itemsep=6pt]
\item \textbf{Motivation} (10 min). Show a wolf population plot.
Ask: what fraction of the variance is real dynamics versus
counting error? Explain why this matters for policy.
Introduce the observation error vs process error distinction.
\item \textbf{Linear Gaussian model and Kalman filter} (20 min).
Write the local level model. Derive the Kalman gain
intuitively. Show how the filtered estimate depends on
$q = \sigma_p^2/\sigma_o^2$. Plot filtered trajectories for
high-$q$ and low-$q$ cases.
\item \textbf{NUTS on the linear model} (15 min). Run NumPyro on
the local level model. Show that posterior means of $x_{1:T}$
match the Kalman smoother. This validates the sampler.
\item \textbf{Nonlinear extension} (20 min). Introduce the
logistic growth transition. Explain why Kalman fails.
Show that the NumPyro code changes by two lines (add $r$ and
$K$, modify the drift). Run NUTS.
\item \textbf{Posterior analysis} (20 min). Marginal posteriors
of $(r, K, \sigma_p, \sigma_o)$. Latent trajectory with
credible bands. $r$--$K$ pair plot showing correlation ridge.
Posterior predictive checks.
\item \textbf{The funnel and non-centred parameterisation} (10 min).
Show divergences in the centred model. Switch to
non-centred. Show they disappear. Explain why this matters.
\item \textbf{Outlook: fisheries} (5 min). Add a catch term to
the transition equation. Point to the companion lecture.
\end{enumerate}
%=============================================================
\bibliographystyle{plainnat}
\begin{thebibliography}{99}
\bibitem[Dennis et al.(2006)]{Dennis2006}
Dennis, B., Ponciano, J.M., Lele, S.R., Taper, M.L., and Staples, D.F.\
(2006).
\newblock Estimating density dependence, process noise, and observation error.
\newblock \textit{Ecological Monographs}, 76(3):323--341.
\bibitem[de Valpine and Hastings(2002)]{deValpine2002}
de Valpine, P.\ and Hastings, A.\ (2002).
\newblock Fitting population models incorporating process noise and observation
error.
\newblock \textit{Ecological Monographs}, 72(1):57--76.
\bibitem[Knape and de Valpine(2012)]{Knape2012}
Knape, J.\ and de Valpine, P.\ (2012).
\newblock Are patterns of density dependence in the Global Population Dynamics
Database driven by uncertainty about population abundance?
\newblock \textit{Ecology Letters}, 15(1):17--23.
\bibitem[Meyer and Millar(1999)]{MeyerMillar1999}
Meyer, R.\ and Millar, R.B.\ (1999).
\newblock BUGS in Bayesian stock assessments.
\newblock \textit{Canadian Journal of Fisheries and Aquatic Sciences},
56(6):1078--1087.
\bibitem[Neal(2003)]{Neal2003}
Neal, R.M.\ (2003).
\newblock Slice sampling.
\newblock \textit{Annals of Statistics}, 31(3):705--767.
\newblock (The ``funnel'' example appears in the discussion of hierarchical
models.)
\bibitem[Papaspiliopoulos et al.(2007)]{Papa2007}
Papaspiliopoulos, O., Roberts, G.O., and Sk{\"o}ld, M.\ (2007).
\newblock A general framework for the parametrization of hierarchical models.
\newblock \textit{Statistical Science}, 22(1):59--73.
\bibitem[Punt et al.(2003)]{Punt2003}
Punt, A.E., Smith, A.D.M., and Cui, G.\ (2003).
\newblock In pursuit of a robust harvest rule for Pacific sardine
(\textit{Sardinops sagax}).
\newblock \textit{Fisheries Oceanography}, 12(4--5):379--392.
\bibitem[Schaefer(1954)]{Schaefer1954}
Schaefer, M.B.\ (1954).
\newblock Some aspects of the dynamics of populations important to the
management of commercial marine fisheries.
\newblock \textit{Bulletin of the Inter-American Tropical Tuna Commission},
1(2):27--56.
\bibitem[Winker et al.(2018)]{Winker2018}
Winker, H., Carvalho, F., and Kapur, M.\ (2018).
\newblock JABBA: Just Another Bayesian Biomass Assessment.
\newblock \textit{Fisheries Research}, 204:275--288.
\bibitem[Hoffman and Gelman(2014)]{HoffmanGelman2014}
Hoffman, M.D.\ and Gelman, A.\ (2014).
\newblock The No-U-Turn Sampler: Adaptively setting path lengths in
Hamiltonian Monte Carlo.
\newblock \textit{Journal of Machine Learning Research}, 15:1593--1623.
\end{thebibliography}
\end{document}
Goal
Add a new lecture,
population_ssm.md— Bayesian State-Space Models: From Linear Filtering to Nonlinear Population Dynamics — teaching Bayesian estimation of state-space models with NumPyro/NUTS, progressing from a linear-Gaussian null (exact Kalman benchmark) to a nonlinear logistic (Ricker) population model.It builds on the existing
kalman.md(linear Gaussian theory) and the NumPyro lectures (ar1_bayes.md,bayes_nonconj.md), and shares machinery with the plannedunemployment_bayes.mdlecture (#910). Internally it is also a stepping stone toward latent-state fisheries/stock-assessment models, but that stays an implicit closing note — not an explicit theme of the finished lecture.Origin
Drafted from
population_ssm_lecture.tex(faithful MyST conversion attached aspopulation_ssm_lecture.md, with corrections flagged inline). The draft is strong; the model is stable and standard and the pedagogical arc is sound. A few technical errors and scope trims are settled below.Review findings (settled)
Verified by simulation where relevant.
lax.scanNumPyro code is broken.numpyro.sample(f"x_{t}", ...)insidejax.lax.scandoes not work (effect handlers don't see sample sites in a rawlax.scan; the name can't be an f-string of a traced index). Fix = non-centred parameterisation from the start (sample innovations in aplate, build the path with a deterministiclax.scan), which also pre-empts the funnel.HalfNormal(0.3)prior (confirmed; prior-predictive: 100% of 3000 draws plausible).Agreed design
kalman.mdand use it purely as the exact benchmark NUTS must match. This section earns its place as the only case with an exact analytical answer to validate the ~40-dim latent-trajectory sampler against.unemployment_bayes; simulation-study-before-real-data; funnel/non-centred; Student-$t$ robustness; lynx as the deliberate PPC-failure example.Section plan
Tasks
lectures/population_ssm.mdin MyST (GPU admonition include; numpyro/jax/arviz imports), with corrected gain formula and Ricker naminglectures/_toc.ymlnear the Bayesian/Kalman lectureskalman.md,ar1_bayes.md,bayes_nonconj.md,unemployment_bayes.mdOriginal draft source —
population_ssm_lecture.tex(Embedded for self-containment. John may also drag-drop the converted
population_ssm_lecture.mdonto this issue.)