-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patha4astepbysetepdemo.Rnw
355 lines (246 loc) · 18.9 KB
/
a4astepbysetepdemo.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
\documentclass[a4paper,english,11pt]{article}
\usepackage{a4a}
\begin{document}
\title{Step by step demo for the \aFa statistical catch-at-age framework}
\input{authors}
\maketitle
\begin{abstract}
%This chapter presents the statistical catch-at-age stock assessment model developed as part of the Assessment For All (\aFa) initiative of the European Commission Joint Research Centre. The stock assessment model framework is a non-linear catch-at-age model implemented in \href{http://www.r-project.org/}{\R}, \href{http://www.flr-project.org/}{\FLR} and \href{http://www.admb-project.org/}{\ADMB}, that can be applied rapidly to a wide range of situations with low parametrization requirements. The model structure is defined by submodels, which are the different parts that require structural assumptions. There are 5 submodels in operation: F-at-age, abundance indices catchability-at-age, recruitment, observation variance of catch-at-age and abundance indices-at-age, and population's age structurein the first year. All submodels use the same type of specification process, the \R formula interface, wich gives lot's of flexibility to explore models and combination of submodels.
\end{abstract}
\newpage
\tableofcontents
\newpage
<<knitr_opts, echo=FALSE, message=FALSE, warning=FALSE>>=
library(knitr)
library(formatR)
#thm = knit_theme$get("bclear") #moe, bclear
#knit_theme$set(thm)
opts_chunk$set(dev='png', cache=TRUE, fig.align='center', warning=FALSE, message=FALSE, dev.args=list(type="cairo"), dpi=250, highlight=TRUE, background='#F2F2F2', fig.lp="fig:", fig.pos="H", width=70, tidy=TRUE, out.width='.9\\linewidth')
# lattice theme
@
\section{Before starting}
\subsection{License, documentation and development status}
The software is released under the \href{https://joinup.ec.europa.eu/community/eupl/home}{EUPL 1.1}.
For more information on the \aFa methodologies refer to \href{http://icesjms.oxfordjournals.org/content/early/2014/04/03/icesjms.fsu050.abstract}{Jardim, et.al, 2014}, \href{http://icesjms.oxfordjournals.org/content/early/2014/03/31/icesjms.fsu043.abstract}{Millar, et.al, 2014} and \href{http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154922}{Scott, et.al, 2016}.
Documentation can be found at \url{http://flr-project.org/FLa4a}. You are welcome to:
\begin{itemize}
\item Submit suggestions and bug-reports at: \url{https://github.com/flr/FLa4a/issues}
\item Send a pull request on: \url{https://github.com/flr/FLa4a/}
\item Compose a friendly e-mail to the maintainer, see `packageDescription('FLa4a')`
\end{itemize}
\subsection{Installing and loading libraries}
To run the \code{FLa4a} methods the reader will need to install the package and its dependencies and load them. Some datasets are distributed with the package and as such need to be loaded too.
<<eval=FALSE>>=
# from CRAN
install.packages(c("copula","triangle", "coda", "grid", "gridExtra", "latticeExtra"))
# from FLR
install.packages(c("FLCore", "FLa4a"), repos="http://flr-project.org/R")
@
<<load_libraries, message=FALSE, warning=FALSE>>=
# libraries
library(devtools)
library(FLa4a)
library(XML)
library(reshape2)
library(ggplotFL)
# datasets
data(hke1567)
data(hke1567.idx)
@
<<>>=
packageVersion("FLCore")
packageVersion("FLa4a")
@
\subsection{How to read this document}
The target audience for this document are readers with some experience in R and some background on stock assessment.
The document explains the approach being developed by \aFa for fish stock assessment and scientific advice. It presents a mixture of text and code, where the first explains the concepts behind the methods, while the last shows how these can be run with the software provided. Moreover, having the code allows the reader to copy/paste and replicate the analysis presented here.
The sections and subsections are as independent as possible, so it can be used as a reference document for the \code{FLa4a}.
\section{Step by step}
\subsection{The "mean" model}
To start the analysis we'll fit a "mean" model, where all submodels will be set to an overall average, by using the \code{~1} formula. This will be our reference model to see how adding age and year effects will show up in the diagnostic tools, in particular in the residuals.
<<>>=
fit01 <- sca(hke1567, hke1567.idx, fmod=~1, qmod=list(~1), srmod=~1, vmod=list(~1, ~1), n1mod=~1)
res01 <- residuals(fit01, hke1567, hke1567.idx)
@
The common residuals plot clearly shows a trend across ages (figure~\ref{fig:meanresbyage}) for both datasets.
<<meanresbyyear, fig.cap="Mean fit residuals by year)">>=
plot(res01)
@
Which is even clearer when plotting the residuals by age across years.
<<meanresbyage, fig.cap="Mean fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res01, auxline="l", by="age")
@
\subsection{The age effects}
These models will introduce age effects in the fishing mortality submodel and catchability submodel. First in the fishinf mortality submodel.
<<>>=
fit02 <- sca(hke1567, hke1567.idx, fmod=~factor(age), qmod=list(~1), srmod=~1, vmod=list(~1, ~1), n1mod=~1)
res02 <- residuals(fit02, hke1567, hke1567.idx)
@
The residuals plot now shows catch at age residuals less stagered, reflecting the modelling of the age effect.
<<fageresbyyear, fig.cap="f age effect fit residuals by year)">>=
plot(res02)
@
The residuals plot by age shows the same outcome.
<<fageresbyage, fig.cap="f age effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res02, auxline="l", by="age")
@
Follwed by the same addition to the catchability model.
<<>>=
fit03 <- sca(hke1567, hke1567.idx, fmod=~1, qmod=list(~factor(age)), srmod=~1, vmod=list(~1, ~1), n1mod=~1)
res03 <- residuals(fit03, hke1567, hke1567.idx)
@
<<qageresbyyear, fig.cap="q age effect fit residuals by year)">>=
plot(res03)
@
The residuals plot by age shows the same outcome.
<<qageresbyage, fig.cap="q age effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res03, auxline="l", by="age")
@
Finally both effects are brought together.
<<>>=
fit04 <- sca(hke1567, hke1567.idx, fmod=~factor(age), qmod=list(~factor(age)), srmod=~1, vmod=list(~1, ~1), n1mod=~1)
res04 <- residuals(fit04, hke1567, hke1567.idx)
@
<<fqageresbyyear, fig.cap="q age effect fit residuals by year)">>=
plot(res04)
@
The residuals plot by age shows the same outcome.
<<fqageresbyage, fig.cap="q age effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res04, auxline="l", by="age")
@
\subsection{The fishing mortality year model}
This model will introduce an year effect in the fishing mortality submodel on top of the F age effect added before.
<<>>=
fit05 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + factor(year), qmod=list(~1), srmod=~1, vmod=list(~1, ~1), n1mod=~1)
res05 <- residuals(fit05, hke1567, hke1567.idx)
@
The residuals plot now shows catch at age residuals stagered as before. The year trends are less pronounced although, because the data doesn't have a very strong year effect, it's less clear than when modelling the age effect.
<<fyearresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res05)
@
The residuals plot by age shows the same outcome.
<<fyearresbyage, fig.cap="f year effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res05, auxline="l", by="age")
@
We can see now that the residuals show a lot less patterns than before. There's still some issues, the survey catchability seems to have an year trend. However the model is not fully specified yet, stock recruitment is modelled as constant over time, the initial population abundance is also modelled as a constant as well as the variance models.
\subsection{The initial year population abundance model, aka N1 }
This model will introduce an age effect in the population abundance in the first year of the time series. This model sets the n-at-age in the first year of the time series, which is needed due to the lack of previous data to reconstruct those cohorts.
<<>>=
fit06 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + factor(year), qmod=list(~factor(age)), srmod=~1, vmod=list(~1, ~1), n1mod=~factor(age))
res06 <- residuals(fit06, hke1567, hke1567.idx)
@
The residuals plot now shows catch at age residuals stagered as before. The year trends are less pronounced although, because the data doesn't have a very strong year effect, it's less clear than when modelling the age effect.
<<n1arresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res06)
@
The residuals by age (figure~\ref{fig:n1resbyage}) the residuals' improvement in the first year of the catch at age time series (bottom left plots).
<<n1resbyage, fig.cap="f year effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res06, auxline="l", by="age")
@
\subsection{The stock recruitment submodel }
There's a lot to say about this submodel (check the \aFa manual for more). In this example we'll simply add a model to allow recruitment to vary over time and we'll see how to track potential improvements in the residuals.
<<>>=
fit07 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + factor(year), qmod=list(~factor(age)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~factor(age))
res07 <- residuals(fit07, hke1567, hke1567.idx)
@
The residuals plot by year are very useful to see the effect of adding a varying stock recruitment model. The year trends present in previous models are not absent. Recruitment variability when left unmodelled was being picked up by trends in the survey catchability and catch at age. And due to the cohort dynamics underlying the catch at age model, where propagating into other ages' estimates.
<<srresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res07)
@
<<srresbyage, fig.cap="f year effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res07, auxline="l", by="age")
@
%\subsection{Reducing the number of parameters}
%In this section we'll look into how to use smoothers to reduce the number of parameters of the model. The process is laid out line by line but it could also be ran with the method \code{multirun} and a function call to compute a BIC profile.
%It's important to have in mind that tin a real situation one should check all k combinations since there are interactions between the submodels which will change the single dimentional analysis use here for demonstration purposes.
%Due to the number of ages in the data, 5, and the minimum degrees of freedom one can use in a \code{MGCV} smoother, 3, the options for these models is to have 3 or 4 ks and since there are 3 submodels one can run 6 fits to check the statistics.
%<<>>=
%fit08a <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08b <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + factor(year), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08c <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08d <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=4))
%fit08e <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08f <- sca(hke1567, hke1567.idx, fmod=~s(age, k=3) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=4))
%fit08g <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + factor(year), qmod=list(~s(age, k=4)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=4))
%do.call("cbind", lapply(a4aFitSAs(a=fit08a, b=fit08b, c=fit08c, d=fit08d, e=fit08e, f=fit08f, g=fit08g), fitSumm))
%AIC(fit08a, fit08b, fit08c, fit08d, fit08e, fit08f, fit08g)
%BIC(fit08a, fit08b, fit08c, fit08d, fit08e, fit08f, fit08g)
%@
%All statistics point to model \code{b} as the best combination of ks. We're going to look at both year effects, in fishing mortality and stock recruitment. Starting with a sparser vector of ks and later focusing in the region of minimal BIC.
%<<>>=
%fit08a <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=15), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08b <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=10), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08c <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%do.call("cbind", lapply(a4aFitSAs(fit08a, fit08b, fit08c), fitSumm))
%BIC(fit08a, fit08b, fit08c)
%AIC(fit08a, fit08b, fit08c)
%fit08d <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=7), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08e <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=6), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08f <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=4), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit08g <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=3), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%do.call("cbind", lapply(a4aFitSAs(fit08d, fit08e, fit08c, fit08f, fit08g), fitSumm))
%AIC(fit08d, fit08e, fit08c, fit08f, fit08g)
%BIC(fit08d, fit08e, fit08c, fit08f, fit08g)
%@
%BIC and gcv point to \code{k = 3} while AIC selects \code{k = 7}. AIC tends to lean towards overfitting so this result it's not surprising. Another interesting source of information is the loglikehood components value. The survey index shows a likelihood minimum at \code{k=7} and catch at age when \code{k=4}. To continue with the exercise we'll keep \code{k = 5}.
%<<>>=
%fit08 <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~factor(year), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%res08 <- residuals(fit08, hke1567, hke1567.idx)
%@
%<<fksresbyyear, fig.cap="f year effect fit residuals by year)">>=
%plot(res08)
%@
%<<fksresbyage, fig.cap="f year effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
%plot(res08, auxline="l", by="age")
%@
%Following the same approach let's check what can be done with regards to the recruitment model degrees of freedom.
%<<>>=
%fit09a <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=15), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit09b <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=10), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit09c <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=5), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%do.call("cbind", lapply(a4aFitSAs(fit09a, fit09b, fit09c), fitSumm))
%AIC(fit09a, fit09b, fit09c)
%BIC(fit09a, fit09b, fit09c)
%fit09d <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=7), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit09e <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=6), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit09f <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=4), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%fit09g <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=3), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%do.call("cbind", lapply(a4aFitSAs(fit09d, fit09e, fit09c, fit09f, fit09g), fitSumm))
%AIC(fit09d, fit09e, fit09c, fit09f, fit09g)
%BIC(fit09d, fit09e, fit09c, fit09f, fit09g)
%@
%There's no clear indication of the best model but since it's always a good idea to be conservative in terms of the number of parameters to avoid overfitting we're going to use \code{k = 5} for the recruitment submodel and as such the best model we get so far is:
%<<>>=
%fit09 <- sca(hke1567, hke1567.idx, fmod=~s(age, k=4) + s(year, k=5), qmod=list(~s(age, k=3)), srmod=~s(year, k=5), vmod=list(~1, ~1), n1mod=~s(age, k=3))
%res09 <- residuals(fit09, hke1567, hke1567.idx)
%@
%<<srksresbyyear, fig.cap="f year effect fit residuals by year)">>=
%plot(res09)
%@
%<<srksresbyage, fig.cap="f year effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
%plot(res09, auxline="l", by="age")
%@
\subsection{The variance submodel}
Finally, we're testing the variance submodel, specifically the catch at age variance model. We won't dig into the catchability variance model though. It's common to accept that a scientific survey following a well designed sampling protocol will have equal variance across ages since no preferential areas are sampled.
<<>>=
fit10 <- sca(hke1567, hke1567.idx, fmod=~factor(age) + s(year, k=10), qmod=list(~factor(age)), srmod=~s(year, k=10), vmod=list(~factor(age), ~1), n1mod=~factor(age))
res10 <- residuals(fit10, hke1567, hke1567.idx)
@
To see what's happening with the variance model one can use predict to plot the different models fitted.
<<vagepredbyage, fig.cap="Variance models for catch at age">>=
flqs <- FLQuants(mod10=predict(fit10)$vmodel$catch[,"2022"], mod09=predict(fit09)$vmodel$catch[,"2022"])
xyplot(data~age, data=flqs, group=qname, type="l", auto.key=T)
@
To see the effect these models have on the estimated quantities one can look at the variance of the estimates:
<<vage, fig.cap="Estimates of population abundance with different variance models">>=
flqs <- FLQuants(mod10=catch.n(hke1567+simulate(fit10, nsim=500))[,"2022"], mod09=catch.n(hke1567+simulate(fit09, nsim=500))[,"2022"])
bwplot(data~qname|factor(age), data=as.data.frame(flqs), scales="free", auto.key=T)
@
and the usual residuals
<<vageresbyyear, fig.cap="f year effect fit residuals by year)">>=
plot(res10, auxline="r")
@
<<vageresbyage, fig.cap="f year effect fit residuals by age)", out.extra = "angle=90", out.height="18cm", out.width="25cm", fig.height=8, fig.width=12>>=
plot(res10, auxline="l", by="age")
@
\end{document}