a4astepbysetepdemo.Rnw

\documentclass[a4paper,english,11pt]{article}
\usepackage{a4a}

\begin{document}

\title{Step by step demo for the \aFa statistical catch-at-age framework}

\input{authors}

\maketitle

\begin{abstract}
%This chapter presents the statistical catch-at-age stock assessment model developed as part of the Assessment For All (\aFa) initiative of the European Commission Joint Research Centre. The stock assessment model framework is a non-linear catch-at-age model implemented in \href{http://www.r-project.org/}{\R}, \href{http://www.flr-project.org/}{\FLR} and \href{http://www.admb-project.org/}{\ADMB}, that can be applied rapidly to a wide range of situations with low parametrization requirements. The model structure is defined by submodels, which are the different parts that require structural assumptions. There are 5 submodels in operation: F-at-age, abundance indices catchability-at-age, recruitment, observation variance of catch-at-age and abundance indices-at-age, and population's age structurein the first year. All submodels use the same type of specification process, the \R formula interface, wich gives lot's of flexibility to explore models and combination of submodels.
\end{abstract}

\newpage
\tableofcontents
\newpage

<<knitr_opts, echo=FALSE, message=FALSE, warning=FALSE>>=
library(knitr)
library(formatR)
#thm = knit_theme$get("bclear") #moe, bclear
#knit_theme$set(thm)
opts_chunk$set(dev='png', cache=TRUE, fig.align='center', warning=FALSE, message=FALSE, dev.args=list(type="cairo"), dpi=250, highlight=TRUE, background='#F2F2F2', fig.lp="fig:", fig.pos="H", width=70, tidy=TRUE, out.width='.9\\linewidth')
# lattice theme
@

\section{Before starting}

\subsection{License, documentation and development status}

The software is released under the \href{https://joinup.ec.europa.eu/community/eupl/home}{EUPL 1.1}.

For more information on the \aFa methodologies refer to \href{http://icesjms.oxfordjournals.org/content/early/2014/04/03/icesjms.fsu050.abstract}{Jardim, et.al, 2014}, \href{http://icesjms.oxfordjournals.org/content/early/2014/03/31/icesjms.fsu043.abstract}{Millar, et.al, 2014} and \href{http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154922}{Scott, et.al, 2016}.

Documentation can be found at \url{http://flr-project.org/FLa4a}. You are welcome to:

\begin{itemize}
	\item Submit suggestions and bug-reports at: \url{https://github.com/flr/FLa4a/issues}
	\item Send a pull request on: \url{https://github.com/flr/FLa4a/}
	\item Compose a friendly e-mail to the maintainer, see `packageDescription('FLa4a')`
\end{itemize}

\subsection{Installing and loading libraries}

To run the \code{FLa4a} methods the reader will need to install the package and its dependencies and load them. Some datasets are distributed with the package and as such need to be loaded too.

<<eval=FALSE>>=
# from CRAN
install.packages(c("copula","triangle", "coda", "grid", "gridExtra", "latticeExtra"))
# from FLR
install.packages(c("FLCore", "FLa4a"), repos="http://flr-project.org/R")
@

<<load_libraries, message=FALSE, warning=FALSE>>=
# libraries
library(devtools)
library(FLa4a)
library(XML)
library(reshape2)
library(ggplotFL)
# datasets
data(hke1567)
data(hke1567.idx)
@

<<>>=
packageVersion("FLCore")
packageVersion("FLa4a")
@

\subsection{How to read this document}

The target audience for this document are readers with some experience in R and some background on stock assessment.

The document explains the approach being developed by \aFa for fish stock assessment and scientific advice. It presents a mixture of text and code, where the first explains the concepts behind the methods, while the last shows how these can be run with the software provided. Moreover, having the code allows the reader to copy/paste and replicate the analysis presented here.

The sections and subsections are as independent as possible, so it can be used as a reference document for the \code{FLa4a}. 

\section{Step by step}

\subsection{The "mean" model}

To start the analysis we'll fit a "mean" model, where all submodels will be set to an overall average, by using the \code{~1} formula. This will be our reference model to see how adding age and year effects will show up in the diagnostic tools, in particular in the residuals.

<<>>=
fit01 <- sca(hke1567, hke1567.idx, fmod=~1, qmod=list(~1), srmod=~1, vmod=list(~1, ~1),  n1mod=~1)
res01 <- residuals(fit01, hke1567, hke1567.idx)
@

The common residuals plot  clearly shows a trend across ages (figure~\ref{fig:meanresbyage}) for both datasets. 

<<meanresbyyear, fig.cap="Mean fit residuals by year)">>=
plot(res01)
@

Which is even clearer when plotting the residuals by age across years.

<<meanresbyage, fig.cap="Mean fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res01, auxline="l", by="age")
@

\subsection{The age effects}

These models will introduce age effects in the fishing mortality submodel and catchability submodel. First in the fishinf mortality submodel.

<<>>=
fit02 <- sca(hke1567, hke1567.idx, fmod=~factor(age), qmod=list(~1), srmod=~1, vmod=list(~1, ~1),  n1mod=~1)
res02 <- residuals(fit02, hke1567, hke1567.idx)
@

The residuals plot now shows catch at age residuals less stagered, reflecting the modelling of the age effect. 

<<fageresbyyear, fig.cap="f age effect fit residuals by year)">>=
plot(res02)
@

The residuals plot by age shows the same outcome.  

<<fageresbyage, fig.cap="f age effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res02, auxline="l", by="age")
@

Follwed by the same addition to the catchability model.

<<>>=
fit03 <- sca(hke1567, hke1567.idx, fmod=~1, qmod=list(~factor(age)), srmod=~1, vmod=list(~1, ~1),  n1mod=~1)
res03 <- residuals(fit03, hke1567, hke1567.idx)
@

<<qageresbyyear, fig.cap="q age effect fit residuals by year)">>=
plot(res03)
@

The residuals plot by age shows the same outcome.  

<<qageresbyage, fig.cap="q age effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res03, auxline="l", by="age")
@

Finally both effects are brought together.

<<>>=
fit04 <- sca(hke1567, hke1567.idx, fmod=~factor(age), qmod=list(~factor(age)), srmod=~1, vmod=list(~1, ~1),  n1mod=~1)
res04 <- residuals(fit04, hke1567, hke1567.idx)
@

<<fqageresbyyear, fig.cap="q age effect fit residuals by year)">>=
plot(res04)
@

The residuals plot by age shows the same outcome.  

<<fqageresbyage, fig.cap="q age effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res04, auxline="l", by="age")
@

\subsection{The fishing mortality year model}

This model will introduce an year effect in the fishing mortality submodel on top of the F age effect added before.

<<>>=
fit05 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + factor(year), qmod=list(~1), srmod=~1, vmod=list(~1, ~1),  n1mod=~1)
res05 <- residuals(fit05, hke1567, hke1567.idx)
@

The residuals plot now shows catch at age residuals stagered as before. The year trends are less pronounced although, because the data doesn't have a very strong year effect, it's less clear than when modelling the age effect. 

<<fyearresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res05)
@

The residuals plot by age shows the same outcome.  

<<fyearresbyage, fig.cap="f year effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res05, auxline="l", by="age")
@

We can see now that the residuals show a lot less patterns than before. There's still some issues, the survey catchability seems to have an year trend. However the model is not fully specified yet, stock recruitment is modelled as constant over time, the initial population abundance is also modelled as a constant as well as the variance models.

\subsection{The initial year population abundance model, aka N1 }

This model will introduce an age effect in the population abundance in the first year of the time series. This model sets the n-at-age in the first year of the time series, which is needed due to the lack of previous data to reconstruct those cohorts.

<<>>=
fit06 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + factor(year), qmod=list(~factor(age)), srmod=~1, vmod=list(~1, ~1),  n1mod=~factor(age))
res06 <- residuals(fit06, hke1567, hke1567.idx)
@

The residuals plot now shows catch at age residuals stagered as before. The year trends are less pronounced although, because the data doesn't have a very strong year effect, it's less clear than when modelling the age effect. 

<<n1arresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res06)
@

The residuals by age (figure~\ref{fig:n1resbyage}) the residuals' improvement in the first year of the catch at age time series (bottom left plots).  

<<n1resbyage, fig.cap="f year effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res06, auxline="l", by="age")
@

\subsection{The stock recruitment submodel }

There's a lot to say about this submodel (check the \aFa manual for more). In this example we'll simply add a model to allow recruitment to vary over time and we'll see how to track potential improvements in the residuals.

<<>>=
fit07 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + factor(year), qmod=list(~factor(age)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~factor(age))
res07 <- residuals(fit07, hke1567, hke1567.idx)
@

The residuals plot by year are very useful to see the effect of adding a varying stock recruitment model. The year trends present in previous models are not absent. Recruitment variability when left unmodelled was being picked up by trends in the survey catchability and catch at age. And due to the cohort dynamics underlying the catch at age model, where propagating into other ages' estimates. 

<<srresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res07)
@

<<srresbyage, fig.cap="f year effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res07, auxline="l", by="age")
@

%\subsection{Reducing the number of parameters}

%In this section we'll look into how to use smoothers to reduce the number of parameters of the model. The process is laid out line by line but it could also be ran with the method \code{multirun} and a function call to compute a BIC profile.

%It's important to have in mind that tin a real situation one should check all k combinations since there are interactions between the submodels which will change the single dimentional analysis use here for demonstration purposes. 

%Due to the number of ages in the data, 5, and the minimum degrees of freedom one can use in a \code{MGCV} smoother, 3, the options for these models is to have 3 or 4 ks and since there are 3 submodels one can run 6 fits to check the statistics.

%<<>>=
%fit08a <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%fit08b <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + factor(year), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%fit08c <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%fit08d <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=4))

%fit08e <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%fit08f <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=4))

%fit08g <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=4))

%do.call("cbind", lapply(a4aFitSAs(a=fit08a, b=fit08b, c=fit08c, d=fit08d, e=fit08e, f=fit08f, g=fit08g), fitSumm))
%AIC(fit08a, fit08b, fit08c, fit08d, fit08e, fit08f, fit08g)
%BIC(fit08a, fit08b, fit08c, fit08d, fit08e, fit08f, fit08g)
%@

%All statistics point to model \code{b} as the best combination of ks. We're going to look at both year effects, in fishing mortality and stock recruitment. Starting with a sparser vector of ks and later focusing in the region of minimal BIC.  

%<<>>=
%fit08a <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=15), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit08b <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=10), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit08c <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%do.call("cbind", lapply(a4aFitSAs(fit08a, fit08b, fit08c), fitSumm))
%BIC(fit08a, fit08b, fit08c)
%AIC(fit08a, fit08b, fit08c)

%fit08d <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=7), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit08e <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=6), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit08f <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=4), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit08g <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=3), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%do.call("cbind", lapply(a4aFitSAs(fit08d, fit08e, fit08c, fit08f, fit08g), fitSumm))
%AIC(fit08d, fit08e, fit08c, fit08f, fit08g)
%BIC(fit08d, fit08e, fit08c, fit08f, fit08g)
%@

%BIC and gcv point to \code{k = 3} while AIC selects \code{k = 7}. AIC tends to lean towards overfitting so this result it's not surprising. Another interesting source of information is the loglikehood components value. The survey index shows a likelihood minimum at \code{k=7} and catch at age when \code{k=4}. To continue with the exercise we'll keep \code{k = 5}.

%<<>>=
%fit08 <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%res08 <- residuals(fit08, hke1567, hke1567.idx)
%@

%<<fksresbyyear, fig.cap="f year effect fit residuals by year)">>=
%plot(res08)
%@

%<<fksresbyage, fig.cap="f year effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
%plot(res08, auxline="l", by="age")
%@

%Following the same approach let's check what can be done with regards to the recruitment model degrees of freedom.

%<<>>=
%fit09a <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=15), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit09b <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=10), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit09c <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=5), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%do.call("cbind", lapply(a4aFitSAs(fit09a, fit09b, fit09c), fitSumm))
%AIC(fit09a, fit09b, fit09c)
%BIC(fit09a, fit09b, fit09c)

%fit09d <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=7), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit09e <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=6), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit09f <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=4), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%fit09g <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=3), vmod=list(~1, ~1),  n1mod=~s(age, k=3))

%do.call("cbind", lapply(a4aFitSAs(fit09d, fit09e, fit09c, fit09f, fit09g), fitSumm))
%AIC(fit09d, fit09e, fit09c, fit09f, fit09g)
%BIC(fit09d, fit09e, fit09c, fit09f, fit09g)
%@

%There's no clear indication of the best model but since it's always a good idea to be conservative in terms of the number of parameters to avoid overfitting we're going to use \code{k = 5} for the recruitment submodel and as such the best model we get so far is:

%<<>>=
%fit09 <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=5), vmod=list(~1, ~1),  n1mod=~s(age, k=3))
%res09 <- residuals(fit09, hke1567, hke1567.idx)
%@

%<<srksresbyyear, fig.cap="f year effect fit residuals by year)">>=
%plot(res09)
%@

%<<srksresbyage, fig.cap="f year effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
%plot(res09, auxline="l", by="age")
%@

\subsection{The variance submodel}

Finally, we're testing the variance submodel, specifically the catch at age variance model. We won't dig into the catchability variance model though. It's common to accept that a scientific survey following a well designed sampling protocol will have equal variance across ages since no preferential areas are sampled. 

<<>>=
fit10 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + s(year, k=10), qmod=list(~factor(age)), srmod=~s(year, k=10), vmod=list(~factor(age), ~1),  n1mod=~factor(age))
res10 <- residuals(fit10, hke1567, hke1567.idx)
@

To see what's happening with the variance model one can use predict to plot the different models fitted.

<<vagepredbyage, fig.cap="Variance models for catch at age">>=
flqs <- FLQuants(mod10=predict(fit10)$vmodel$catch[,"2022"], mod09=predict(fit09)$vmodel$catch[,"2022"])
xyplot(data~age, data=flqs, group=qname, type="l", auto.key=T)
@

To see the effect these models have on the estimated quantities one can look at the variance of the estimates:

<<vage, fig.cap="Estimates of population abundance with different variance models">>=
flqs <- FLQuants(mod10=catch.n(hke1567+simulate(fit10, nsim=500))[,"2022"], mod09=catch.n(hke1567+simulate(fit09, nsim=500))[,"2022"])
bwplot(data~qname|factor(age),  data=as.data.frame(flqs), scales="free", auto.key=T)
@

and the usual residuals

<<vageresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res10, auxline="r")
@

<<vageresbyage, fig.cap="f year effect fit residuals by age)",  out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res10, auxline="l", by="age")
@

\end{document}