covid-pipeline-reports.tex

\begin{fullwidth}
\chapter[A scenario modeling pipeline for \textsc{covid}-19 emergency planning]{A scenario modeling pipeline for \textsc{covid}-19 \\emergency planning}
\label{sec:covid-pipeline-reports}
Coronavirus disease 2019 (\textsc{covid}-19) has caused strain on health systems worldwide due to its high mortality rate and the large portion of cases requiring critical care and mechanical ventilation. During these uncertain times, public health decision-makers, from city health departments to federal agencies, sought the use of epidemiological models for decision support in allocating resources, developing non‑pharmaceutical interventions, and characterizing the dynamics of \textsc{covid}‑19 in their jurisdictions. In response, a flexible scenario modeling pipeline was developed, that could quickly tailor models for decision-makers seeking to compare projections of epidemic trajectories and healthcare impacts from multiple intervention scenarios in different locations. Here, the components and configurable features of the COVID  Scenario Pipeline are presented. Model limitations and active areas of development to meet ever‑changing decision-maker needs are also presented.
  
This chapter presents a modeling pipeline tailored for the response to \textsc{covid}-19. The COVID  Scenario Pipeline is an ongoing project, still actively developed, and many improvements have been added since the version presented in this thesis,  where a perspective from July 2020 is taken. The state of knowledge on the SARS-CoV-2 and \textsc{covid}-19 was rapidly expanding, and the present chapter highlights the challenges of dealing with such uncertainties. This chapter is based on:
\longfullcite{Lemaitre:ScenarioModelingPipeline:2021}, where Kyra H. Grantz, Joshua Kaminsky, Hannah R. Meredith, Shaun A. Truelove contributed equally. It is referred in the following as the postprint (and its supplementary information as \textsc{si})
  \end{fullwidth}

%\paragraph{Health Outcomes} are computed on top of the incidence for each compartments. Each outcomes $O$  is specified from a source incidence $S$, with a delay $\Delta$ and a duration $D$:

% \begin{algorithm}[H]
%\SetAlgoLined
%  \SetKwInOut{Input}{inputs}
 % \SetKwInOut{Output}{output}
 % \SetKwProg{FindAnMFS}{FindAnMFS}{}{}
 % \FindAnMFS{$(Q,D)$}{
%\KwResult{Write here the result }
%\Input{A failing query $Q = t_1 \wedge \dots \wedge t_n$; an \texttt{RDF} database $D$}
 %\Output{An \texttt{MFS} denoted by $Q^*$}
%draw $\Delta$ from config distribution\;
%draw $D$ from config distribution\;
%\ForEach{$t$ in $t_i, t_i+1, ..., t_f$}{
%$O[t+\Delta:t+\Delta+D]\leftarrow O[t+\Delta:t+\Delta+D] + \text{Binom}(S[t], p_{O\mid S})$\;}
 %}
 %\caption{Computations of health outcomes}
%\end{algorithm}


\section{Introduction}

In late 2019, the virus responsible for coronavirus disease 2019 (\textsc{covid}-19) was detected in Wuhan, China\cite{Zhu:NovelCoronavirusPatients:2020}. Since its emergence, SARS-CoV-2 has spread rapidly, causing significant morbidity and mortality and prompting the World Health Organization to declare a pandemic on March 11, 2020\cite{WHO:WHODirectorGeneralOpening:2020}. With 4.07\textsc{m} confirmed deaths and 190\textsc{m} confirmed cases as of July 2021, it is one of the deadliest pandemics in history. In addition to its significant individual health impacts, \textsc{covid}-19 has put considerable strain on health systems, as a large fraction of cases requires mechanical ventilation or critical care\cite{Huang:ClinicalFeaturesPatients:2020}. In every stage of the pandemic thus far, there has been a need for flexible decision support tools that can be used to model and compare critical planning scenarios. 

Epidemiological models have played an important role in shaping public health policy and interventions throughout the pandemic. The methods used have ranged widely—from agent-based modeling approaches that simulate the global movement of individuals and their contacts in household, workplace, and leisure settings\cite{Ferguson:ReportImpactNonpharmaceutical:2020}, to population-level models that incorporate features like age-specific transmission, asymptomatic and pre-symptomatic transmission, and metapopulation structure\cite{Chinazzi:EffectTravelRestrictions:2020,Branas:FlatteningCurveIt:2020,Moghadas:ProjectingHospitalUtilization:2020,Davies:AgedependentEffectsTransmission:2020}, to curve fitting approaches that use data from early in the \textsc{covid}-19 pandemic to project future burden\cite{IHME:ForecastingImpactFirst:2020}. Likewise, the goals of these models have varied widely, from assessing importation risk, estimating the fraction of cases attributable to transmission from unobserved infections, projecting the impact of non-pharmaceutical interventions that target different populations, and forecasting the needs of the healthcare system. 

Within this space, there was a need for a modeling pipeline that could provide flexible but sophisticated epidemiological models to decision-makers who needed to plan and compare specific interventions. Here a scenario modeling pipeline is detailed: a modular framework that projects epidemic trajectories and health care impacts under different suites of interventions in order to aid in scenario planning. The flexibility of this approach has allowed us to provide rapid support to multiple organizations at the same time while customizing the pipeline models to situation-specific questions and data. This framework has been used to provide tailored estimates of the relative impacts of different scenarios of disease transmission, severity, and control, thus guiding intervention policies in several states, countries, and humanitarian aid settings.

\section{Methods}
\subsection{Pipeline at a glance}
\begin{figure}[!htb]
    \centering
    \includegraphics[width = .8\textwidth]{fig_pipeline/fig1a}
    \caption[Overview of the pipeline]{Overview of the pipeline. The pipeline has four modules, each with specific inputs that can be specified by the user. First, it identifies when and in which model locations epidemics are seeded using an air importation model or confirmed case data. Second, the epidemic seeding events are used to initiate the disease transmission model, which is informed by epidemiological assumptions and intervention scenarios. The disease transmission model produces daily incident infection counts and infection prevalence. Next, health outcomes like hospitalizations and ICU admissions are calculated from these infection counts according to assumptions about health outcome risks and infection fatality ratios. Finally, these health outcomes may be summarized using templates and functions from the report generation component of the pipeline.}
    \label{fig:pipeline-modules}
\end{figure}
The pipeline consists of multiple modular components designed to run in sequence to produce results and reports focused on policy-relevant outcomes in any set of geographic locations (i.e., multiple countries, a single country, a set of subnational administrative units, or a single sub-national administrative unit) (fig.~\ref{fig:pipeline-modules}). While the pipeline was developed to be extended, the current core components are (1) epidemic seeding, (2) the transmission model, (3) health outcome generation engine, and (4) report generation. 

These modular components of the pipeline fit together because each is composed of multiple pieces: an input format, an output format, one or more code libraries (where applicable), and a runner script. The standardized input and output formats ensure that components may be switched out according to user preference without impacting other phases. 

The pipeline runs these components in sequence, according to the specifications outlined in a configuration file. This makes it easy to add or modify a component. To add a component, its input format is specified, and incorporate its dependencies. When appropriate, entire components may be substituted with data from outside of the pipeline, provided that the data meet the input formats required by the next pipeline phase.

\subsection{Module 1: Epidemic seeding and initialization.} 

“Epidemic seeding” refers to how the disease transmission module is initialized with infected individuals. A seeding module must produce one or more seeding files that specify an added number of incident cases occurring due to “seeding” at particular dates and locations. The pipeline currently contains two epidemic seeding options: (1) seeding according to first case appearance in data, and (2) seeding according to an air travel importation model. 

\paragraph{Seeding according to earliest identified cases} This seeding option enables users to seed the model according to \textsc{covid}-19 case data. It currently supports user-supplied data and downloads from two commonly used public sources, the Johns Hopkins University Center for Systems Science and Engineering (JHU-CSSE) \textsc{covid}-19 Dashboard\cite{Dong:InteractiveWebbasedDashboard:2020} and USAFacts, a database that collates data from US state health departments\cite{USAFacts:USCOVID19Cases:2021}. Drawing from the user-specified data source, this option identifies the first five days that cases were reported in each modeled location. It is assumed that confirmed cases were infected a user-specified number of days prior to when they were reported, and that there is a user-specified ratio of infections to confirmed cases. Seed infections are, hence, created in each modeled location on the estimated days of infection for the first 5 days with reported cases; they are drawn stochastically from a Poisson distribution where the mean is the product of the number of reported cases and the user-specified ratio. 

\paragraph{Seeding according to an air importation model} A previously published model of measles importation was adapted to model the rate of \textsc{covid}-19 importation to specific locations due to air travel\cite{Truelove:EpidemicsAirTravel:2020}. This seeding option, available in the Github repository “\url{HopkinsIDD/covidImportation}”, uses complete itinerary (origin to final destination) air travel volume data from OAG\cite{OAG:FlightDataOAG:2020} for all airports in the world, source location populations, and source location incidence data, to inform a model with which absolute counts of importation are estimated on a daily basis into airports\cite{Truelove:HopkinsIDDCovidImportationInitial:2020}. 

Geographic areas surrounding airports are classified spatially into “airport catchment areas” with a Voronoi tessellation of space in reference to the latitude and longitude coordinates of the airport\cite{Balcan:ModelingSpatialSpread:2010} (fig. \ref{fig:airport-china}). When there are multiple airports within close proximity, the user may specify a threshold distance under which airports may be grouped into a single cluster that is defined by its centroid. The probability of importation is assigned to each intersection of a Voronoi tile and an administrative unit boundary. This probability is calculated as the proportion of the airport catchment area population that lives in that intersection, assuming that population is distributed evenly by area. All air importations on a given day are then aggregated to the administrative unit level and seeded into the epidemic model as newly infected individuals.

 \begin{marginfigure}%[-2\baselineskip]
\centering
\includegraphics{fig_pipeline/airport_tessellation.pdf}
\margincaption[Tessellation of airports in China]{A possible tessellation of airports in China, where the country is divided in regions depending on the closest airport.}
\label{fig:airport-china}
\end{marginfigure}

\subsection{Module 2: Transmission model and intervention scenarios.}
The disease transmission module takes in seeding information and produces an epidemic model output file that contains, at minimum for subsequent module compatibility, daily counts of incident infections, indexed to their time of symptom onset. The currently implemented default transmission module comprises a metapopulation model with stochastic Susceptible-Exposed-Infected-Recovered (SEIR) disease dynamics.
\marginnote{The model for SARS-CoV-2 transmission is now entirely configurable to any compartmental model following an arbitrary transition graph. It allows for multi-strain, age-stratified models or models with vaccinated compartments.}

\paragraph{Disease dynamics} The core model is a modified SEIR compartmental model where the time in the “Infected” compartment follows an Erlang distribution (i.e., the infected compartment is split into $k$ compartments) to produce more realistic infectious periods where the chance of recovery depends on the time since infection\cite{Yan:QuantitativeMethodsInvestigating:2019}, and a coefficient ($\alpha$) can be set to help the model approximate non-homogeneous mixing between susceptible and infected individuals and non-exponential growth\cite{Finkenstadt:StochasticModelExtinction:2002}. By default $k$ is set to 3 compartments. The transition of individuals between disease compartments is simulated stochastically with binomial random draws:
\marginnote[5\baselineskip]{Early 2020, COVID Scenario Pipeline runs where parametrized based on the serial interval and the basic reproductive number of \textsc{covid}-19 as follow:

\textsc{Serial Interval} The serial interval (SI) represents the interval between two subsequent infections. Here $\sigma$ and $\gamma$ are parameterized with estimates of the range of the serial interval  or generation time, such that:
\begin{equation}
\text{SI}=\frac{1}{2}\left(\frac{1}{\gamma }\right)+\frac{1}{\sigma },
\end{equation}
where it is assumed that the average infection occurs halfway through an index case’s infectious period. For SARS-CoV-2, the serial interval was estimated to be in range $6.5-8.2$, from: \fullcite[][tab. S4]{Bi:EpidemiologyTransmissionCOVID19:2020}. 

\textsc{The basic reproductive number} $R_0$ is the number of newly infected caused by an infected in a fully susceptible population. It has been estimated in the range 2 -- 3 from early modeling works \fullcite{Riou:PatternEarlyHumantohuman:2020} (this work also characterizes the dispersion of the number of secondary cases).
}
{\begin{eqnarray}
N_{S \to E} (t) &=& \text{Binom}\left(S, 1 - e^{- \Delta t \cdot \text{FOI}(t)}\right) \\
N_{E \to I^{( 1)}} (t) &=& \text{Binom}\left(E, 1 - e^{- \Delta t \cdot \sigma} \right) \\
N_{I^{( 1 )} \to I^{( 2)}} (t) &=& \text{Binom}\left(I^{( 1)}, 1 - e^{- \Delta t \cdot \gamma'} \right) \\
N_{I^{(2)} \to I^{(3)}} (t) &=& \text{Binom}\left(I^{(2)}, 1 - e^{- \Delta t \cdot \gamma' } \right) \\
N_{I^{(3)} \to R} (t) &=& \text{Binom}\left(I^{(3)}, 1 - e^{- \Delta t \cdot \gamma' } \right) \\
\gamma' &=& \gamma \cdot k,
\end{eqnarray}
where $S$, $E$, $I^{\left(1\right)}$, $I^{\left(2\right)}$, $I^{\left(3\right)}$, and $R$ represent the number of individuals in those respective compartments, FOI(t) is the force of infection from the infected population on the susceptible population, $\frac{1}{\sigma}$ is the latent period, $\frac{1}{\gamma}$ is the infectious period, and $k$ is the number of $I$ compartments. The force of infection, which modulates transition of individuals from the $S$ to $E$ compartments:

\begin{eqnarray}
\text{FOI}(t) &=&\beta \cdot \frac{I{(t)}^{\alpha}}{H}, \\
\end{eqnarray}
where $H$ is the total population, $\beta$ is the daily transmission probability as defined by $R_0$ and the infectious period, and $\alpha$ is the mixing coefficient and:
\begin{eqnarray}
I(t) &=& \sum\limits_{{j = 1}}^{k} I^{( j)} \\
H &=& S(t)+E(t)+\sum_{j=1}^{k} I^{\left(j\right)}(t)+R(t) \\
\beta(t) &=& R_{0}\cdot\gamma.
\end{eqnarray}

\paragraph{Metapopulation dynamics}
The model is capable of simulating disease spread in multiple locations jointly according to assumptions about population mobility between individual model locations (\eg, administrative units). The SEIR disease dynamics described above are simulated in each model location with a modification to the force of infection term that accounts for this impact of mobility on disease spread.

The force of infection in a given location $i$ is calculated from a combination of local infections and infections in locations that are connected to it according to the mobility matrix, as follows:
\begin{fullwidth}
\begin{equation}
\text{FOI}_i = \left(1 - \sum_{j\neq i} p_{away} \frac{M_{i,j}}{H_i} \right) \cdot \beta_i(t) \frac{\left(I_i^{(1)} + I_i^{(2)} + I_i^{(3)}\right)^\alpha}{H_i} +  \sum_{j \neq i} \left(p_{away} \frac{M_{i,j}}{H_i} \cdot \beta_j(t) \frac{\left(I_j^{(1)} + I_j^{(2)} + I_j^{(3)}\right)^\alpha}{H_j} \right),
\end{equation}
\end{fullwidth}
with $p_{away}$ the percent of the time individuals that move spend away ($p_{away} \approx 0.5$ in the case of commuting). $H_i$ is the population of node $i$. M is a mobility matrix such that $M_{i,j}$ represents the daily movement of individuals (e.g., commuting) from origin $i$ to destination $j$. The transition of individuals between disease compartments may be modified to index by location $i$, for example:

\begin{equation}
N_{S_i \to E_i} (t) = \text{Binom}\left(S_i ,1 - e^{- \Delta t \cdot\text{FOI}_{i}(t) } \right).
\end{equation}
Users may provide a symmetric or asymmetric wide-form mobility matrix for all model locations or a long-form sparse mobility matrix that indicates only pairs of model locations with connectivity.
\begin{figure*}[!htb]%[width = .7\textwidth]
    \centering
    \includegraphics{fig_pipeline/fig2a}
    \caption[Time series of the daily number of hospital beds needed across scenarios]{Time series of the daily number of hospital beds needed across five possible intervention scenarios in a fictional location with nine counties. Lines represent results from 50 stochastic model simulations. Horizontal black lines represent the total hospital bed capacity in the fictional location. The colored horizontal bars along the top visualize the effectiveness of interventions at a given time point along with a dark blue to light blue spectrum; dark blue indicates a period with no reductions to transmission, while light blue indicates a period with more restrictive action (i.e., low transmissibility).}
    \label{fig:pipeline-seir}
\end{figure*}


\paragraph{Application of transmission modifiers}
In the absence of vaccines and other preventive treatments, non-pharmaceutical interventions, such as school closures, social distancing, stay-at-home directives, and testing and isolation are critical strategies for reducing disease transmission. Disease transmission may also change across space or time as the result of exogenous factors like seasonality or spatial heterogeneity in contact patterns. The model enables users to specify changes to the basic reproductive number ($R_0$) and the inverse of the infectious period ($\gamma$), for pre-specified periods of time to all or subsets of model locations independently. These interventions or exogenous changes can be implemented with fixed or distributional effectiveness. In addition, users may specify a rate of fatiguing effectiveness (e.g., declining adherence to a policy) over a certain number of days. This format enables flexibility in scenario planning; for instance, the model can be used to examine the effects of chaining multiple interventions together over time (e.g., school closure then stay-at-home), gradual declining adherence of the population to an intervention, switching interventions on and off over time, spatially heterogeneous interventions, or innate spatiotemporal heterogeneities (fig.~\ref{fig:pipeline-seir}).

Transmission modifiers like non-pharmaceutical interventions or other exogenous factors modulate the daily transmission term below:
\begin{equation}
\beta _i'(t)=\left(1-r_i(t)\right)\cdot \beta_i (t),
\end{equation}
where $\beta_i'(t)$ is the daily transmission rate after accounting for transmission modifier $r_i(t)$ at the specified location $i$ at time $t$. When transmission modifiers are in effect $\beta_i'(t)$ replaces $\beta_i(t)$ in the force of infection term $\text{FOI}_i(t)$.
Note that several effects at the same time in the same place have a compounded effect on transmission of reductions. The effect of the $k$ intervention in place at time $t$ and node $i$ is:
\begin{equation}
	\beta_i'(t) =  \beta_i(t) \cdot  \prod_k \left(1-r_k(t) \right), \label{eq:npi_comp}
\end{equation}
but the compounding effect may be chosen to be additive (\eg for vaccination rate) instead of multiplicative.

The specification of transmission modifiers is completely user-specified. A set of common non-pharmaceutical intervention scenarios that can be applied for user-specified dates and locations are included in the tab. \textsc{s1} of the postprint \textsc{si}. These intervention scenarios have been compiled according to a review of the literature on the potential impact of non-pharmaceutical interventions on respiratory virus transmission\footnote{Which shows how research works conducted for past epidemics allowed to estimate the range of expectations for NPIs effectiveness well before evidence for \textsc{covid}-19 became available. This table includes \eg the outcomes of different responses to the 1918 influenza as input for social distancing policies and estimates on how influenza transmission was impacted when schools were closed due to holidays or inclement weather for the impact of school closures. As for current estimates, the next chapter deals with evaluating the impact of NPIs against SARS-CoV-2 transmission, and these estimates were used to parametrize the COVID Scenario Pipeline on Switzerland.}.

\subsection{Module 3: Calculation of health outcomes}
This pipeline module translates outputs from the transmission model into health outcomes such as hospitalizations and deaths. It takes in counts of daily incident infections and produces daily counts for specific health outcomes at appropriate time delays.

The current default implementation\footnote{at the time of submission. Similar to the transmission module, the outcome module now allows arbitrary transition graphs between outcomes, allowing it to accurately reflect any facet of the healthcare and reporting system. Among other improvements, note that transition probabilities (such as reporting fraction) maybe be modified in time and space with modifiers very similar to these for NPIs.} produces hospital and intensive care unit (ICU) admissions, current hospital and ICU occupancy, ventilators needs, and the number of deaths. The modeling of health outcomes assumes that there is some transition probability from infection to death, infection to hospitalization, hospitalization to ICU admission, and ICU admission to ventilator use. When modeling health outcomes, is considered the probability of its occurrence (e.g., the probability that infections are hospitalized), the time delay relative to its disease course (e.g., the time between hospital admission and ICU admission), and where applicable, the duration in a given state (e.g., how long a patient remains ventilated)\footnote{Despite an added complexity, this explicit approach to model the healthcare system has some advantages with respect to using compartments (as done in the following chapters); the principal being that it is possible to use arbitrary distribution,  such as log-normal, to model the residence time in each stage.}.

For settings where hospitalization, ICU admission, and ventilator use are infrequent, it may be more appropriate to think of these health outcome projections as different levels of disease severity.

The user may specify health outcome probabilities and delays, conditional on the flows described above. For use as default values, tables of parameter values derived from a literature review of \textsc{covid}-19 health outcomes are provided in the postprint \textsc{si}.

The pipeline currently contains two versions of this module with different approaches to the specification of health outcome risks: (1) unadjusted, uniform risk and (2) location-specific risks, adjusted by key demographic or health factors in each location.

\paragraph{Unadjusted, population-wide health outcome risks}
This option generates health outcome estimates with unadjusted risk across all locations, assuming fixed values for all health outcome probabilities, delays, and durations.

It is assumed that the number of infections admitted to the hospital is a draw from the Binomial distribution, lagged by a fixed time from symptom onset to hospital admission:
\begin{equation}
n_{t_\text{inf} + t_\text{inf} \to \text{hosp}}^{hosp} \sim \text{Binom}\left(n_{t_\text{inf}}^\text{inf}, p_{\text{hosp}\mid \text{inf}} \right),
\end{equation}
where $n^\text{hosp}$ is the number of hospital admissions, $n^\text{inf}$ is the number of infections, $t_\text{inf}$ is the time of infection, $t_{\text{inf} \to \text{hosp}}$ is the mean time delay between infection and hospital admission, and $p_{\text{hosp} \mid \text{inf}}$ is the probability of hospitalization given infection (tab~\ref{tab:csp-outcomes}).

\begin{table*}[t]
\label{tab:csp-outcomes}
\centering
\begin{tabular}{ll}
\toprule
 Column name (notation) & Description\\
\midrule
\verb|p_hosp_inf| $(p_{\text{hosp} \mid \text{inf}})$	& Probability of hospitalization among infected individuals\\
\verb|p_icu_hosp| $(p_{\text{icu} \mid \text{hosp}})$	&Probability of ICU admission among hospitalized individuals\\
\verb|p_vent_icu| $(p_{\text{vent} \mid \text{icu}})$ &	Probability of ventilation among individuals in the ICU\\
\verb|p_death_inf| $(p_{\text{death} \mid \text{inf}})$ &	Probability of death among infected individuals\\
\verb|rr_hosp_inf|	& Relative risk of hospitalization (given infection) relative to the average across all geoids\\
\verb|rr_death_inf| & Relative risk of death (given infection) relative to the average across all geoids\\
\bottomrule
\end{tabular}
\caption[Health outcome risk parameters]{Health outcome risk parameters.}
\end{table*}

A similar assumptions is made for the transitions between other outcomes (hospitalization admission to ICU admission, ICU admission to ventilator use, infection to death):
\begin{eqnarray}
n_{t_\text{hosp} + t_\text{hosp}\to \text{icu}}^\text{icu} &\sim & \text{Binom}\left(n_{t_\text{hosp}}^\text{hosp} , p_{\text{icu}\mid \text{hosp}} \right)\\
n_{t_\text{icu} + t_\text{icu} \to \text{vent}}^\text{vent} &\sim & \text{Binom}\left(n_{t_\text{icu}}^\text{icu}, p_{\text{vent} \mid \text{icu}} \right) \\
n_{t_\text{inf} + t_\text{inf} \to \text{death}}^\text{death} &\sim & \text{Binom}\left(n_{t_\text{inf}}^\text{inf}, p_{\text{death}\mid \text{inf}}  \right).
\end{eqnarray}
The number of patients currently hospitalized, admitted to ICU, and ventilated are generated from the incident number of events and fixed (non-distributional) user-defined durations for each event.

As information about the death and hospitalization rates was scarce early on in the pandemic, decision-makers wanted to consider how things would unfold over different scenarios of health burden. To facilitate these needs, the pipeline separates hospitalization and death rates from the other outcomes, and allows users to consider multiple scenarios for different rates (\eg scenarios for different \textsc{covid}-19's IFR before evidence on disease severity precised the estimates).

\paragraph{Location-specific health outcome risks}
The severity of disease from SARS-CoV-2 infection can vary greatly between locations due to differences between populations; individuals that are older, with limited access to health care, or with certain pre-existing health conditions are at greater risk for severe disease and death. For this reason, the hospitalization module also supports the specification of location-specific relative risks, which can be used, for example, to standardize hospitalization and mortality rates according to the age distribution of the population. Users may provide a wide format data file with standardized variable names for the transition probabilities and relative risks by geoid. These transition probabilities are conditional on previous states (e.g., probability of hospitalization given infection is named \verb|p_hosp_inf|). The probability of hospitalization and death given infection are specified as relative values compared to population-wide averages specified in the configuration file. The location-specific standardizations apply only to the health outcomes, making a critical assumption that all individuals are at equal risk of infection.

To facilitate the construction of such files a companion package, covidSeverity\cite{Lauer:HopkinsIDDCovidSeverityInitial:2020}, is provided. It produces location-specific relative death and hospitalization rates based on the age distribution of the local population. The package generates these outputs for US counties based on data from the US Census Bureau, and this is provided as a model input file as part of the main pipeline implementation (see \verb|COVIDScenarioPipeline/sample_data/geoid-params.csv|). The package also includes built-in functionality to pull data from WorldPop\sidenote{\url{worldpop.org}} and generate adjustments for any location of interest.%\cite{Lauer:HopkinsIDDCovidSeverityInitial:2020}.

The covidSeverity package applies a logistic generalized additive model (GAM) with a penalized cubic spline for age and a random effect for age-specific estimates of the risk of each health outcome from the literature, thus producing estimates of risk for 10-year, aggregated age categories. These age-specific estimates are then applied to the population age distribution in a given location.

The epidemic transmission model, the intervention module, and the outcome calculation are implemented in python, just-in-time compiled to machine code using Numba\cite{Lam:NumbaLLVMbasedPython:2015} for performance. Since this description, the team working on the pipeline has brought many improvements, both computational and conceptual, that have enabled the pipeline to remain useful in 2021.

\subsection{Module 4: Summarization of model outputs}
This component of the pipeline provides wrapper functions for the lightweight summarization of model outputs into quantiles, plotting functions for common figures, and R Markdown templates to facilitate the rapid generation of technical reports. This module is available in the R package report.generation in the Github repository “\verb|HopkinsIDD/COVIDScenarioPipeline|.”

Two key functions that read and process individual transmission and health outcome model output files are provided. In managing individual files with these functions, the processing time and memory load is reduced. Both of these functions take processing functions as arguments, thus enabling aggregation and filtering to occur at the level of individual files.

The package contains technical report R Markdown templates for US states, US counties, and individual countries, a diagnostic report template, and a template that is maintained solely for integration testing. When report.generation is installed and loaded, the templates become available to the user. Many parameters are drawn from the configuration file automatically and pre-written R Markdown chunks about the module options and methods can be referenced within the package.

Common figures include summary tables and time courses for estimated health outcomes under different interventions, maps that portray cumulative cases at the county level, comparisons between model estimates and observed cases and deaths, and visualizations for when ICU and ventilator capacity for each county is exceeded. The vignette in the postprint \textsc{si} walks through example report outputs in more detail and a template-generated report example is provided.

\subsection{Model specification}
\marginnote[-1\baselineskip]{
\textsc{Model access} The project is open-source under the GNU General Public License v3.0 license, and code is available at \url{github.com/HopkinsIDD/COVIDScenarioPipeline}. The master branch of this repository consists of a Python package “SEIR” and two R packages “hospitalization” and “report.generation” which correspond to the second, third, and fourth modules of the model pipeline. Air importation-based seeding is implemented in the covidImportation package (\url{github.com/HopkinsIDD/covidImportation}), while seeding according to the earliest identified cases is performed in scripts within the COVID Scenario Pipeline repository.}
All components and settings for simulations from the COVID  Scenario Pipeline model are specified in an easily modifiable YAML configuration file. Different options are described in detail in the postprint and in the GitHub repository wiki.

\section{Results}
The COVID Scenario Pipeline has been used to support several partners, including the state of California and the national US response with regular reports tailored to their needs. While these are private, it is however possible to see the current projections from the pipeline: regular outputs are shared as part of the two following multi-modeling experiments\footnote{While it is possible to single out the COVID Scenario Pipeline output on these websites, the interest of these studies lies in the ensemble trajectory that combines projections from many models, mitigating issues of opinionated model design and assumptions.}:

\paragraph{covid-19 forecast hub} Along with dozens of other modeling teams, the Johns Hopkins ID Dynamics COVID-19 Working Group forecasts a few weeks ahead the \textsc{covid}-19 pandemic in the US. In this case, no common assumptions are made: teams strive to accurately project expected dynamics. The current forecasts are visible on \url{covid19forecasthub.org}\cite{Cramer:EvaluationIndividualEnsemble:2021}.

\paragraph{Scenario Modeling Hub} proposes longer-term projections under different scenarios where \eg variant transmission, immune escape, or vaccination hesitancy differs. Each round consists of 4 scenarios whose assumptions are shared across teams, while the rest is left to the choice of the modeling teams. The scenarios projections for current and past rounds are visible at \url{covid19scenariomodelinghub.org}\cite[][presents round 4 results.]{Borchering:ModelingFutureCOVID19:2021}.

\subsection{Example of scenario planning report for Canton de Vaud}
Moreover, a report produced on April 9, 2020 for Canton de Vaud main hospital, CHUV, is presented in the next two pages. This report is part of a series of reports produced during the first wave of the \textsc{covid}-19 pandemic in collaboration with Canton de Vaud authorities. While different from typical pipeline reports, it shows how the assumptions, projections, and uncertainties are communicated to decision-makers. 

After summarizing the assumptions taken in this report, short-term forecasts are presented, both graphically and in terms of quantities of interests such as peak ICUs and hospital occupancy. As the inference module of the pipeline did not exist at this time, these projections were generated using a simple filtering method to select parameter values that were reasonable given the observed epidemiological dynamics.  Longer-term projections are presented for three transmission scenarios, with an additional visualization of the probability of ICU capacity exceedance under these scenarios. Finally, assumptions and limitations which characterize the projections are summarized at the end of the report. 

In the report appendix, along with additional projections, a live estimation of the hospitalized case-fatality ratio is provided. In fact, access to individual-level healthcare data was provided as part of the interactions with public-health officials of Canton de Vaud.  This dataset improved considerably scenario reports estimates that were subsequently tailored for local specificities of the health system and provided the foundations for the study presented in \textsc{Chapter~5}\footnote[][-3\baselineskip]{It is interesting to compare how the real-time data analysis in this report compare with the detailed estimates presented in the \textsc{Appendix to Chapter~5} that uses two additional weeks of data.}.

% pagecommand adds page numbers,  need package pdfpages

 \includepdf[pages=-, pagecommand={\thispagestyle{empty}}]{fig_pipeline/planning_mod.pdf}


\section{Discussion}\footnote{This section discusses the COVID Scenario Pipeline as presented in this chapter, \ie going back to the July 2020 perspective.}
The COVID scenario pipeline is presented as an open-source modeling framework that aims to balance epidemiological rigor with the flexibility and urgency required by public health policymaking. The modularity of the proposed framework has enabled us to adapt the assumptions about COVID-19 epidemiology, transmission, and health outcome risks in response to emerging information and to different settings. The pipeline implementation of non-pharmaceutical interventions is highly adaptable for policymakers desiring to compare the impact of different potential scenarios.

\begin{figure}[!htb]%[width = .7\textwidth]
    \centering
    \includegraphics{fig_pipeline/fig3a}
    \caption[Health outcome risks and logistical needs for fictional counties]{Health outcome risks and logistical needs for fictional counties. In the scatterplots, each point indicates (A) the age-adjusted infection fatality ratio and (B) the risk of ICU admission given hospitalization by mean age for a county within the United States. Data for nine fictional counties is marked by magenta triangles. (C) The heat maps display county-level ICU bed needs, shaded according to the log-ratio above or below the assumed ICU bed capacity (secondary y-axis) in each county (primary y-axis) for three example intervention scenarios (panels). The salmon pink shading indicates periods of time where ICU bed needs exceed capacity in the fictional counties.}
    \label{fig:pipeline-outcome}
\end{figure}

Throughout the course of the pandemic, the default settings of the pipeline have been adapted in response to the changing needs and questions of our collaborators. At the beginning of the pandemic, air importation seeding was a critical determinant of epidemic onset in specific locations. Now that cases of \textsc{covid}-19 are present worldwide, the development shifted towards more empirical methods of epidemic seeding that better match trajectories of confirmed cases in specific locations as policy questions have shifted to more operational needs. As new data emerged, it was decided to move from calculating unadjusted health outcomes to health outcomes based on the age-standardized risk of hospitalization, ICU admission, and death according to emerging case-study data. 
\begin{figure}[!htb]%[width = .7\textwidth]
    \centering
    \includegraphics{fig_pipeline/fig4a}
    \caption[County-level \textsc{covid}-19 risk for three scenarios]{County-level \textsc{covid}-19 risk for three scenarios (Uncontrolled, Pulsed, Checker) in the fictional Location X. Choropleths for model outcomes on June 1, 2020 of the (A–C) cumulative infection rate per 10,000 population and (D–F) number of patients currently admitted to the ICU per 10,000 population for the Uncontrolled, Social Distancing Pulsed, and Social Distancing Checker intervention scenarios. County-level variation in attack rates can arise from differences in risk of importation, mobility patterns connecting subdivisions, and differences in non-pharmaceutical interventions applied in each location. If location-specific health outcome risks are specified (as are age-standardized health outcome risks in this example), this may serve as another source of county-level variation.}
    \label{fig:pipeline-map}
\end{figure}

As the \textsc{covid}-19 pandemic continues, the plan is to continue to expand the scope of the COVID Scenario Pipeline to changing needs and questions. New model releases include a health outcomes model expansion that will enable a multiplicity of pathways to ICU occupancy, ventilator usage, and death. As questions have shifted to near-term operational needs, inference has been incorporated into the pipeline, thus enabling the calibration of model trajectories to deaths and confirmed case counts, short-term forecast of health outcomes, and estimation of location-specific transmission parameters and NPI effectiveness. Moreover, the epidemic transmission model has been re-written to allow for arbitrary transition graphs between compartments, which allow one to explicitly model vaccination campaigns, variants, immune escape and multiple age classes.
\marginnote[-3\baselineskip]{\textsc{Recent developments}
The present chapter solely focuses on a perspective from July 2020, but the current state of the project has seen numerous improvements since then. The pipeline transmission and outcomes model, as presented, were hard-coded simple models. These models allowed to successfully capture the early \textsc{covid}-19 dynamics and to present reports to decision-makers. These simple models, focusing on a few important aspects of disease transmission, captured accurately epidemiological dynamics and uncertainties. Provided uncertain estimates of the serial interval and $R_0$ (or other quantities), these models allow one to derive insights on projected future impact of the epidemic. Simplicity is a strength as it prevents overconfidence and the reliance on too many unknowns while highlighting the key aspects in transmission. However, as the pandemic progressed it brought more information and the need for additional flexibility. As months of evidence are now available, the pipeline benefits from an additional inference module (based on MCMC with adaptation for large-scale computations). And as mentioned in margin notes, most of the assumptions described in this chapter have now been relaxed while keeping the framework principle similar (independent module and single configuration file). The transmission module allows age-stratified, multi-strain models with vaccinated compartments and arbitrary transition rates in space and time. The health outcome allows arbitrary transition graphs and probabilities can change in space and time. Many other developments, such as helpers to write configuration files (a USA configuration file is about 30'000 lines), performance improvements, and report features were (and still are) added.}
While the pipeline’s generic structure means that different modules may be readily replaced, the implementation of the model as described here has several limitations with regard to the epidemiology of \textsc{covid}-19. The disease transmission model does not explicitly incorporate age-specific contact or transmission rates or asymptomatic transmission, nor does it consider how factors like testing rates may lead to time-varying biases in reporting. Data to inform these factors was scarce when this model was initially developed, but these processes can change the dynamics displayed by scenario projections. Moreover, this model structure means that age-targeted interventions  (e.g., cocooning of high-risk age groups) cannot be modeled, nor are interventions targeting asymptomatic individuals (e.g., systematic age-specific testing) in a mechanistic manner. However, it is possible, and done is real settings, to adjust for these types of interventions through overall population-level reductions in disease transmission. Additionally, the current model cannot tune changes in population mobility or connectivity over time, although it is known that travel and movement restrictions played a role in changing the spread of \textsc{covid}-19.

The health outcomes module as described in this chapter also has several structural limitations; it assumes that only one progression in health outcome severity exists (infection to hospitalization to ICU to ventilator use), although it is known that many disease course progressions are possible. In addition, the delays and durations involved in the modeled health outcomes progression are fixed in time for a given location, although stochastic variation may be incorporated across simulations.

Nevertheless, the modular approach taken is meant to allow for easy substitution of models with improvement in any of these areas while still taking advantage of other pipeline components. This feature has been leveraged throughout the course of the \textsc{covid}-19 pandemic, and individual modules continue to develop. This flexibility does come at a cost, as the modular pipeline approach requires us to write and read files at the end and beginning of each phase, respectively. This procedure requires more disk space and input/output steps than other modeling approaches that can hold all of the necessary data in memory until a single output is produced at the end. Still, these slowdowns are not critically limiting; it has been possible to run 1000 county-level simulations of the United States in less than 10 min on a 96 core server.

These limitations point to a broader need to consider the totality of evidence generated by epidemiological models. While the proposed approach is well-suited to answering policy questions about interventions, it is critical for policymakers to explore projections from multiple models in order to understand the range of possible trajectories and the sensitivity of results to different assumptions. Models that incorporate individual-level behaviors may be better for considering the impact of specific contact tracing strategies or location-specific measures like workplace occupancy or symptom screening policies that are not well captured in compartmental models such as ours\cite[-5\baselineskip]{Kucharski:EffectivenessIsolationTesting:2020,Firth:UsingRealworldNetwork:2020,Hinch:OpenABMCovid19AgentbasedModel:2020}. Other individual-based models are better suited for addressing heterogeneities due to differences in household or social structures\cite{Wilder:ModelingBetweenpopulationVariation:2020,Kerr:CovasimAgentbasedModel:2021}. Models incorporating real-time mobility data can best characterize the impact of movement-related restrictions\cite{Lai:EffectNonpharmaceuticalInterventions:2020} models with age-specific transmission may provide more detail on the impact of age-specific interventions like “cocooning”\cite{Duque:COVID19HowRelax:2020} or closing and opening schools\cite{Ferguson:ReportImpactNonpharmaceutical:2020}. Still other models are particularly suited to address questions about health systems burden and forecast operational needs\cite{Branas:FlatteningCurveIt:2020,LosAlamosNationalLaboratory:COVID19CasesDeaths,Weissman:LocallyInformedSimulation:2020} and to consider the economic impacts of transmission and interventions\cite{Acemoglu:OptimalTargetedLockdowns:2020,Silva:COVIDABSAgentbasedModel:2020}. Integrating knowledge from multiple models, where appropriate, with careful consideration of the assumptions and appropriate applications of each model, will strengthen response and preparedness\cite{Shea:HarnessingMultipleModels:2020}.


Our flexible modeling pipeline brings an important voice to this “conversation” of models, by allowing rapid and flexible specification and simulation of even very complex intervention scenarios, and providing flexibility to rapidly update models as our understanding of a disease changes. This approach only reaches its full potential when parameters are based on careful and ongoing consideration of the literature and available data. But, when appropriately used as part of an iterative approach to decision making, this pipeline can be a valuable tool for public health decision-making.