matmet.tex



% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter[Materials and methods]{Materials and methods} %(Computational epidemiomology or Infectious Disease Modeling

In this section, we present freely four topics in methods.

%\section{Infectious Diseases Epidemics} %as a phenomena 
Infect

\section{Modeling of Epidemics}
Modeling has several purposes:
The ultimate goal being  the forecasting and estimating the propagation of uncertainties from the observable past to the future. control

Alongside, modeling helps us to understand the different processes of the cholera life cycle. 

As a substitute for physical experiments, different mechanistic pathways can be compared by benchmarking them against the reproduction of reported cases. 

Models may have only one spatial dimension or may consider the spatial spread of the epidemic across several nodes. Spatially-explicit mathematical models provide key insights into the course of an ongoing epidemic, where transmission is heterogeneous in space and time.

% (Bayesian philosophy, data, representation of reality) sochastic/determistic Modeling of epidemics (uncertainties, social factors, heterogeneities)


In the next part of this chapter, we present so

%data on health
Model makes use of epidemiological data: reported cases, severity of outcomes, serology survey. 

Covariates or input data range from rainfall and temperature, human mobility.

Information and prior on

The cornstone of epidemiological modeling is the compartmental model. Individuals belongs to compartiments who caraterize their stage with respect to the disease. The seminal work has introduced compartiments for Susceptible, Infected, Recovered. The SIR model, first introduced first by Kermack and McKendrick\cite{Kermack:ContributionMathematicalTheory:1927}. 

Or

An Exposed compartiment may be added 
Compartimental + spatially explicit


Written

\section{Previous cholera model}

(example of SIR model)


\subsection{Simulating epidemic model}


\paragraph{Transitions}


\paragraph{Residence time}

\cite{Hurtado:GeneralizationsLinearChain:2019}

\marginnote{It’s also possible to approximate a non-Erlang delay distribution with a mixture of Erlang distribution \parencites[Appendix 4]{Hurtado:GeneralizationsLinearChain:2019}.}

\section{Inference and calibration} 
Model parameters in the SIRB cholera models represent physical rates. However, in the literature there is no consensus on their values or on their  probability distribution. Numerical calibration is therefore required to assess their value in any particular application. Parameter identifiability is one of the major problems during the calibration of SIR-like epidemiological models. This is mainly due to the impossibility to directly measure the exact figures of individual infections in each compartment, the uncertainty associated to epidemiological data and the net effect of possible model over-parametrization. Calibration thus often results in combination of parameters that produce almost identical fits, termed equifinality\footnote{For a cholera example, Eisenberg et al. (\fullcite{eisenberg_identifiability_2013}) highlighted that the presence of uncertain measurements might generate strongly correlated parameters, which cannot be unambiguously identified.}.

\paragraph{Iterated Filtering}


\paragraph{Markov-Chain Monte Carlo}


 Filtering
\subsection{Evaluation}
We may evaluate thte model predictive accuracy
\section{Model Selection}
% DONE w.r.t gelman
% TODO w.r.t McElreath
For inference or prediction, several candidates models may be devised for same data and process. Model selection aims at finding the \textit{best} (set of) model(s) for the task at hand. In the perspective of identifying the data-generating process a consistent model selection criteria is warranted\parencite{Hoge:PrimerModelSelection:2018}.  On way of doing that is to compare their predictive accuracy. Sometime it can be computed in way that is tailored to the application and embed the benefits and cost of predictiding data with the model \footnote{i.e for weather, predicting rain when there’s none is considered lower cost than predicting sun when it rains}, or we may resort to generic rules.
The log-likelyhood (or log predictive density), $\Pr(data \mid \boldsymbol\theta$  is often used. Note by using the likelyhood instead of the posterior, we don’t depends on the priors but just on the data model (but as the prior influence $\boldsymbol\theta$, it does affects the calculation of the predictive density). The log predictive accuracy as a scoring function i related to information theory, and to the Kull-back Leibler information.
Unfortunatly, the hypothetical gold standard that is the expected log predictive density for new datapoints is out-of-reach in most of our problems. However we can compute an optimistic\footnote{optimistic because we have used the same as we used to calibrate our model. Generaly, out-of-sample predictions (i.e for a new datapoint) will be less accurate that what this within-sample accuraty}estimation, computing the log-poinwise predictive density of our model to the fitted data. For $s$ posterior samples and our $n$ datapoints, we have
\begin{equation}
	EQUN 7.4~\label{eq:llik}
\end{equation}
To approxmimate the true out-of sample predictive accurcy we can 
\begin{itemize}
	\item Fit the model to a part (a fold) of the available data and compute it’s predictive accucary on the other part. This is called \emph{cross-validation}, and a common version of it is leave-one-out cross validation, where, in turn we leave out just one datapoint for calibration and compute the predictive accuacy on it. This is computationally expensive, as we need to perform $n$ calibrations.
	\item Hence many methods where designed to unbias the within-sample accuracy and approximate the results of cross-validation, usually by substracting  a correction made of some functions for the effective parameters being fit. We find here an alphabet of information criterions (ICs):  Akaike (AIC), Deviance  (DIC), Watanabe-Akaike (WAIC)\footnote{AIC takes into consideration the number of parameters while DCI, WAIC compute aneffective number of parameters that takes into account the observed data. While the means are different,  all estimating the same thing: the expected out-of-sample deviance associated with a model. BIC, introduced futher, is notably different because as its goal is different: estimating the marginal probabability of the the data under the model and is related to the bayes factor, see below.} being notable.
\end{itemize}
And both of these methods penalize overfitting (more paramters always improve fit), as they assess the generality of the model.


. Within each family of deterministic and stochastic models, we employ both Bayesian (Bayes Factors, BFs) and frequentist (likelihood ratio tests) model comparison techniques. 

For the deterministic model, calibrated using Bayesian inference, we use the Bayes Factors (BFs). Bayes Factors are a controversial\footnote{See the great debate in Sociological Methodology:\fullcite{Raftery:BayesianModelSelection:1995} and counter arguments\fullcite{Gelman:AvoidingModelSelection:1995}} method that we use in our XXX chapter to compare discrete choices of model.

We have a model selection problem. Given the observation of data $D$, we have to choose between two models $\mathcal{M}_i$ and $\mathcal{M}_j$ each parametrized by parameters sets $\theta_i$ and $\theta_j$.

\paragraph{Bayes Factor}  The goal might be to choose on of the model or to averates discrete set of probabilites. The aim is to link the posteriors and prior odd ratios:
\begin{equation}
    \overbrace{\frac{\mathbb{P}(\mathcal{M}_i\mid D)}{\mathbb{P}(\mathcal{M}_j\mid D)}}^{\text{posterior odd ratio}} =
    \overbrace{BF_{i,j}}^\text{Bayes factor} \times  \overbrace{\frac{\mathbb{P}(\mathcal{M}_i)}{\mathbb{P}(\mathcal{M}_j)}}^{\text{prior odd ratio}}
\end{equation}
 where $\mathbb{P}(\mathcal{M}\mid D)$ represents the posterior probability of a model $\mathcal{M}$ is the probability of the model given the data
Given Bayes theorem, we have
\begin{equation}
  BF_{i,j} = \frac{\mathbb{P}(D\mid\mathcal{M}_i)}{\mathbb{P}(D\mid\mathcal{M}_j)}
\end{equation}
 where $\mathbb{P}(D\mid\mathcal{M})$ is  the \textit{model evidence}, or marginal likelihood of model $\mathcal{M}$ \parencite{Aitkin:PosteriorBayesFactors:1991}. It represents the probability of the data given the model, not assuming any particular model parameters (because parameter $\theta$ have been \textit{marginalized (integrated) out}), and is given by a (difficult to compute) integral:
 \begin{equation}
\mathbb{P}(D\mid\mathcal{M}) = \int_\theta \mathbb{P}(D\mid\theta,M) \mathbb{P}(\theta\mid M) d\theta .
\end{equation} 
 The model evidence can be directly computed using the output the mcmc algorithm as the harmonic mean of each model likelihood over the posterior distribution of the parameters. As described by\textcite{Weinberg:ComputingBayesFactor:2012}, if one can draw $N$ samples from the posteriors of the parameters, $\theta^{(i)} \sim \mathbb{P}(\theta\mid\mathcal{M}, D)$, a Monte Carlo estimate of the model evidence is given by:
 \begin{equation}
 	\mathbb{P}(D\mid\mathcal{M}) \approx {\left[ \frac{1}{N} \sum_{i=1}^N \frac{1}{\underbrace{\mathbb{P}(D\mid\theta^{(i)})}_{\text{likelyhood of } \theta^{(i)}}} \right]^{-1}}, 
 \end{equation}
which can then be used to derive BFs (see \parencite{Raftery:EstimatingIntegratedLikelihood:2007} for a discussion on the possible numerical issues involved in this approximation.
%\end{multicols}

%\begin{multicols}{2}
 \paragraph{Bayesian Information Criterion} On the other hand, model evidence of the stochastic model was evaluated using the Bayesian Information Criterion (BIC) computed at the Maximum Likelihood Estimate (MLE) resulting from the multiple itertated filtering algorithm (see section~\ref{sec:calibration}). BIC enables the estimation of the BFs as $BF_{i,j} \approx e^{\frac{1}{2}(BIC_j - BIC_i)}$ yielding the relative support of each $\mathcal{M}_i$ against $\mathcal{M}_j$ \parencite{Schwarz:EstimatingDimensionModel:1978a,Burnham:MultimodelInferenceUnderstanding:2004}. 
 
  \paragraph{Likelyhood ratio tests} Since the set of models we consider here are nested, we additionally perform likelihood ratio tests (LR-tests) to determine the significance of incorporating additional transmission processes on top of the basic SIRB formulation. LR-tests were implemented by sequentially\footnotemark{} relaxing the constraint that the human-to-human and rainfall-related parameters are equal to 0 in \textbf{MN}. More particularly, we perform a LR-test for each pair of models $\{\mathcal{M}_0, \mathcal{M}_1\}$ who’s parameter vectors $\boldsymbol{\theta}^0,\boldsymbol{\theta}^1$ take values from the set of all possible values in the parameter space $\Theta \subset \mathbb{R}^{n_p}$, for which at least on of the parameters that is not null is $\boldsymbol{\theta}^1$ is equal to $0$ in $\boldsymbol{\theta}^0$, i.e.  $\boldsymbol{\theta}^0 \in \Theta^0 \subset \Theta, \boldsymbol{\theta}^1 \in \Theta^1 \subset \Theta, \Theta^0 \subset \Theta^1 = \Theta \setminus \Theta^0$, with $\Theta^0 = \{ \boldsymbol{\theta} \mid \{\theta_1, \dots, \theta_p  \} \in \mathbb{R}^{n_p} \wedge \ \exists i \in \{1,\dots, p\} : \theta_i = 0 \} $.
%Taken including the Bayesian Information Criterion (BIC)~ and a the . BIC is a criterion which penalizes the best log-likelihood computed during calibration by the model complexity (in this case, the number of parameters $n_p$) thus allowing to compare models having different degrees of freedom. DIC is particularly suited for Bayesian model selection when using an \textsc{mcmc} approach. DIC considers the average ensemble deviance associated to the posterior trajectories and penalizes it by its variance. Finally the likelihood-ratio test (LR test) is a statistical test to evaluate if the increase of complexity among two nested models is convenient with respect to the improvement of the fit (increase of the log-likelihood).
%\end{multicols}


 \section{Estimation of $R_0$}


\section{Optimal control}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% OK
\paragraph{Optimal Control Theory}
Optimal control theory describes the application of control variables to a system for the purpose of maximizing some measure of performance. The system is subject to its dynamics, with possible constraints on state and control variables. Let $\textbf{x}(t)$ be the state of a system at time $t$, and $\textbf{u}(t)$ be the control (or input) variable. 

In the general form, a continuous time optimal control problem (OCP) is written as the minimization of a cost functional $J$  with respect to the control variable $u(t)$:
\begin{equation}
\text{Minimize~~~} \min_{u(\cdot)} J=\Phi\,[\,\textbf{x}(t_0),t_0,\textbf{x}(t_f),t_f\,] + \int_{t_0}^{t_f} \mathcal{L}\,[\,\textbf{x}(t),\textbf{u}(t),t\,] \,dt
\end{equation}
subject to the system dynamics, path constraints and boundary conditions:
\begin{align}
\dot{\textbf{x}}(t) & =  \textbf{g}\,[\,\textbf{x}(t),\textbf{u}(t),t\,] \eqname{System dynamics}\\
\textbf{b}\,[\,\textbf{x}(t),\textbf{u}(t),t\,]  & \leq  \textbf{0} \eqname{Path constraints} \\
\boldsymbol{\phi}\,[\,\textbf{x}(t_0),t_0,\textbf{x}(t_f),t_f\,] & =  0 \eqname{Boundary conditions} 
\end{align}

The objective $J$ is the sum of the endpoint cost function $\Phi$ and the so called Lagrangian functional $\mathcal{L}$ that represent the cost along the path followed by the system.

The first constraints of the system are its dynamics, here a general ordinary differential equation $\textbf{g}$. In the case of cholera application, this is the spatially-explicit SIRB model. Path constraints $\textbf{b}$ in the form of inequalities are said inactive, and may be active when imposed as equalities. Finally, boundary conditions allow for forcing the system to end in a certain state, and to specify initial conditions.

Note that it is also possible to optimize the end time $t_f$. A simple change of variable transforms the OCP above into a free-end time problem. This possibility might result fundamental to optimize for the time of cholera extinction.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% OK
\paragraph{Solving an Optimal Control Problem}
Solving an OCP is a difficult task, but it  has received large attention during the cold war to guide rockets and missiles. Except for simple problems, like the linear quadratic  control, OCPs do not have analytical solutions and numerical solutions are typically sought. We highlight here two general methods for solving OCPs:

\par Indirect methods (or ``optimize then discretize’’) use the calculus of variation to obtain first-order optimal conditions. The OCP is transformed in a multi-point boundary value problem (BVP) that has the form of a Hamiltonian system. Pontryagin’s maximum principle guarantees the optimality of the solution. While this solution is elegant, the BVP may be difficult to solve and recently, direct methods are frequently preferred in applications.

\par The rational behind direct (or ``discretize then optimized’’) methods is that a nonlinear programming (NLP) optimization problem with large dimension (tenth of thousand variables) has an easier numerical solution than a simple BVP, because it is sparse. That is, we rewrite our OCP (in the infinite dimensional functional space) into an optimization NLP (in a finite dimensional Euclidean space). 

In short, state and control variables are approximated (e.g using a piece-wise constant function) and the cost functional becomes a cost function. The problem is now to find the coefficients defining the control variable approximation, in such a way that they minimize the approximated cost function $F$. This leads to the following NLP:
\begin{equation}
\min_{\textbf{x} \in \mathbb{R}^n} F(\textbf{x})
\end{equation}
subject to:
\begin{equation}
g(\textbf{x}) = 0
\end{equation}    
\begin{equation}
h(\textbf{x}) \leq 0
\end{equation}
This problem is solved by any NLP solver, like Ipopt~\cite{wachter_implementation_2006}.   Optimal control toolboxes, like CasADI~\cite{andersson_casadi:_2012}, allow us to transform the symbolic formulation of the OCP in into an NLP. 

Despite a wide range of power tools, solving even a simple  optimal control problem is still a computationally-intensive task.

\paragraph{Epidemiological Applications of Optimal Control}

Optimal control is of great interest for practitioners and policy makers: it allows to formally back up existing policies, and may suggest alternative action plans. Indeed, such discoveries have to be carefully analyzed in order to be understood, as they might be due to model features. We present a brief literature review of studies on optimal control for epidemiological applications. This non-exhaustive review should present a sufficient picture of the current research landscape.

First, theoretical studies\cite{kar_stability_2011, laguzet_global_2015} on  generic epidemiological models explored the feasibility and features of optimal control polices. These studies, first by Morton and Wickwire\cite{morton_optimal_1974}, Sethi and Staats\cite{sethi_optimal_1978}, Greenhalgh\cite{greenhalgh_results_1988}, Behncke\cite{behncke_optimal_2001} analyzed  various spatially-implicit models in a control system perspective, deriving existence, uniqueness, and stability of the optimal control for some (ideal) inputs like quarantine and vaccination.

It is also possible to use optimal control in order to find some general rules for disease intervention. For example, Rowthorn\textit{et al.}~\cite{rowthorn_optimal_2009} showed that for a system with two interconnected regions and a particular epidemic, equalizing infection in the two regions is the worst possible strategy in minimizing the total number of infection.

These studies are either theoretical, without connection to a particular setup nor actual data, or compare different control strategies defined by the author. In the framework of this thesis, I aim to develop a computational tool able to solve the optimal control problem for actual cholera outbreaks in quasi-real time, in order to provide objective information that can promptly support the health-care actors during the emergency.