This repository has been archived by the owner on Sep 30, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 41
/
Copy path2018-05-23 Multilevel Models.Rpres
1331 lines (801 loc) · 36.4 KB
/
2018-05-23 Multilevel Models.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Multilevel Models
========================================================
author: Tamara Maier
date: 23-5-2018
autosize: true
Schedule for today
========================================================
```{r, warning=F, echo=F}
library(rethinking)
library(extraDistr)
setwd("C:/Users/Tamara/Documents/3.Semester/SpecialTopics")
```
- Introduction with example
- Example: Multilevel tadpoles
- Varying effects and the underfitting/overfitting trade-off
- More than one type of cluster
- Multilevel posterior predictions
- Exercises
Models up to now
========================================================
- moves from one cluster to another, estimating parameter for each cluster, but forgetting everything about the previous clusters
- assume parameters are independent of one another and learn from separate portions of data
- instead we want to use all the information
- learn simultaneously about each cluster while learning about the population of clusters
- allows us to transfer information across cluster and that improves accuracy
Example: waiting time in cafes
========================================================
- after observation update prior
- posterior distribution for the waiting time at the first cafe
- the distribution of waiting times in the population becomes the prior for each cafe
- track a parameter for each cafe and at least 2 parameter to describe the population
- as observing the waiting times, update estimates for each cafe and estimates for the population
Example: waiting time in cafes
========================================================
- if population seems highly variable, then the prior is flat and uninformative -> observations at any cafe do little to the estimates at another
- if population seems to contain little variation, the prior is narrow and highly informative -> observations at any cafe will have impact on estimates at other cafes
Multilevel Models
========================================================
- remember features of each cluster as they learn about all the clusters
- depending on variation among the clusters, which is learned by the data, the model pools information across clusters
- pooling tends to improve estimates about each cluster
Costs of Multilevel Models:
========================================================
- define the distributions from which the characteristics of the clusters arise -> conservative maximum entropy distributions
- new estimation challenges -> MCMC estimation
- can be hard to understand, because they make predictions at different levels of the data
Example: Multilevel tadpoles
========================================================
```{r, warning=F}
data(reedfrogs)
df <- reedfrogs
#make the tank cluster variable
df$tank <- 1:nrow(df)
head(df)
```
Example: Multilevel tadpoles
========================================================
$$ $$
$$\large s_i \sim Binomial(n_i, p_i)$$
$$\large logit(p_i) = \alpha_{TANK[i]}$$
$$\large \alpha_{TANK} \sim Normal(0,5)$$
Example: Multilevel tadpoles
========================================================
```{r, warning=F}
#m12.1 <- map2stan(
#alist(
# surv ~ dbinom(density, p),
# logit(p) <- a_tank[tank],
# a_tank[tank] ~ dnorm(0,5)
# ),
# data = df
#)
load("m121.RData")
```
Example: Multilevel tadpoles
========================================================
$$ $$
$$ s_i \sim Binomial(n_i, p_i)$$
$$ logit(p_i) = \alpha_{TANK[i]}$$
$$ \alpha_{TANK} \sim Normal(\alpha,\sigma)$$
$$ \alpha \sim Normal(0,1)$$
$$ \sigma \sim HalfCauchy(0,1)$$
Example: Multilevel tadpoles
========================================================
```{r, warning=F}
# m12.2 <- map2stan(
# alist(
# surv ~ dbinom(density, p),
# logit(p) <- a_tank[tank],
# a_tank[tank] ~ dnorm(a,sigma),
# a ~ dnorm(0,1),
# sigma ~ dcauchy(0,1)
# ),
# data = df, iter=4000, chains=4
# ) #estimates for 50 parameters: one overall sample intercept alpha, the variance among tanks sigma and then 48 per tank intercepts
```
Example: Multilevel tadpoles
========================================================
```{r, warning=F}
load("m122.RData")
compare(m12.1, m12.2)
```
Example: Multilevel tadpoles
========================================================
```{r, warning=F, echo=FALSE}
#extract Stan samples
post <- extract.samples(m12.2)
#compute median intercept for each tank
#also transform to probability with logistic
df$propsurv.est <- logistic(apply(post$a_tank, 2, median))
#display row proportions surviving in each tank
plot(df$propsurv, ylim=c(0,1), pch=16, xaxt="n", xlab="tank", ylab="proportion suvival", col=rangi2)
axis(1, at=c(1,16,32,48), labels=c(1,16,32,48))
#overlay posterior medians
points(df$propsurv.est)
#mark posterior median probability across tanks
abline(h=logistic(median(post$a)), lty=2)
#draw vertical dividers between tank densities
abline(v=16.5, lwd=0.5)
abline(v=32.5, lwd=0.5)
text(8, 0, "small tanks")
text(16+8, 0, "medium tanks")
text(32+8, 0, "large tanks")
```
<!-- Example: Multilevel tadpoles -->
<!-- ======================================================== -->
<!-- - blue points show empirical proportions of survivors -->
<!-- - black circles are the varying intercept medians -->
<!-- - horizontal dashed line is the estimated median survival proportion in the population of tanks -->
<!-- Example: Multilevel tadpoles -->
<!-- ======================================================== -->
<!-- - in every case the multilevel estimate is closer to the dashed line than the raw empirical estimate is -> phenomenon is called shrinkage -->
<!-- - shrinkage results from regularization (as in Chapter 6) -->
<!-- - estimates for the smaller tanks have shrunk farther from the blue points -> varying intercepts for the smaller tanks, with smaller sample sizes, shrink more -->
<!-- - the farther a blue point is from the dashed line, the greater the distance between it and the corresponding multilevel estimate -> shrinkage is stronger, the further a tanks empirical proportion is from the global average $\alpha$ -->
<!-- Example: Multilevel tadpoles -->
<!-- ======================================================== -->
<!-- - these phenomena come from pooling information across clusters to improve estimates -->
<!-- - pooling means that each tank provides information that can be used to improve the estimates for all of the other tanks -->
Varying effects and the underfitting/overfitting trade-off
========================================================
- varying intercepts are regularized estimates, but regularized by estimating how divers the clusters are while estimating the features of each cluster
- benefit of using varying effects estimates, instead of empirical raw estimates, is they provide more accurate estimates of the individual cluster intercepts
- on average, the varying effects actually provide a better estimate of the individual tank means
- reason is that they do a better job of trading off underfitting and overfitting
underfitting/overfitting trade-off in context of frog example
========================================================
predict future survival from three perspectives:
- complete pooling: assume population of ponds is invariant; estimating a common intercept for all ponds
- no pooling: assume each pond tells us nothing about any other pond
- partial pooling: using an adaptive regularizing prior, as in the previous section
complete pooling
========================================================
- pooling the data from all ponds to produce a single estimate that is applied in every pond
- assuming the variation among ponds is zero (all ponds are identical)
- use overall mean across all ponds to make predictions for each pond
- lot of data contributes to estimate, so can be quite precise
- estimate is unlikely to exactly match the mean of any particular pond -> underfits the data
no pooling
========================================================
- use survival proportions for each pond to make predictions
- use separate intercept for each pond (blue points are same kind of estimate)
- in each pond, quite little data contributes to each estimate, estimates are imprecise (especially in small ponds)
- error of estimates is high -> overfit the data
- no information is shared across ponds
- assuming the variation among ponds is infinite
partial pooling
========================================================
- produce estimates for each cluster that are less underfit than the grand mean and less overfit than the no pooling estimates
- tend to be better estimates of the true per-cluster means
- especially when ponds have few tadpoles in them
simulate tadpole data
========================================================
compare no pooling estimates to partial pooling estimates
$$ s_i \sim Binomial(n_i, p_i)$$
$$ logit(p_i) = \alpha_{POND[i]}$$
$$ \alpha_{POND} \sim Normal(\alpha,\sigma)$$
$$ \alpha \sim Normal(0,1)$$
$$ \sigma \sim HalfCauchy(0,1)$$
simulate tadpole data
========================================================
assing values to:
- $\alpha$, the average log-odds of survival in the entire population of ponds
- $\sigma$, the standard deviation of the distribution of log-odds of survival among ponds
- $\alpha_{POND}$, a vector of individual pond intercepts, one for each pond
- samples sizes $n_i$ to each pond
simulate tadpole data
========================================================
```{r, warning=F}
a <- 1.4
sigma <- 1.5
nponds <- 60
ni <- as.integer(rep(c(5, 10, 25, 35), each = 15))
a_pond <- rnorm(nponds, mean = a, sd = sigma) #true log-odd survival for each pond
dsim <- data.frame(pond = 1:nponds, ni = ni, true_a = a_pond)
#simulate survivors
#each pond i has ni potential survivors
#pi=exp(a_pond)/(1+exp(a_pond))
#generate simulated survivor count for each pond:
dsim$si <- rbinom(nponds, prob = logistic(dsim$true_a), size = dsim$ni)
```
compute no-pooling estimates
========================================================
calculate the proportion of survivors in each pond
```{r, warning=F}
dsim$p_nopool <- dsim$si / dsim$ni
```
compute partial-pooling estimates
========================================================
```{r, warning=F}
# m12.3 <- map2stan(
# alist(
# si ~ dbinom(ni, p),
# logit(p) <- a_pond[pond],
# a_pond[pond] ~ dnorm(a,sigma),
# a ~ dnorm(0,1),
# sigma ~ dcauchy(0,1)
# ),
# data = dsim, iter=1e4, warmup = 1000
# )
load("m123.Rdata")
m12.3new <- map2stan(m12.3, data=dsim, iter=1e4, warmup=1000)
```
compute partial-pooling estimates
========================================================
```{r, warning=F}
estimated.a_pond <- as.numeric(coef(m12.3new)[1:60])
dsim$p_partpool <- logistic(estimated.a_pond)
#to compare the true per-pond survival probabilities
dsim$p_true <- logistic(dsim$true_a)
#compute absolute error between the estimates and the true varying effects
nopool_error <- abs(dsim$p_nopool - dsim$p_true)
partpool_error <- abs(dsim$p_partpool - dsim$p_true)
```
plot of error for the simulated tadpole ponds
========================================================
```{r, warning=F, echo=F}
plot(1:60, nopool_error, xlab="pond", ylab="absolute error", col=rangi2, pch=16, ylim=c(0, 0.4))
points(1:60, partpool_error)
#average error
segments(1, mean(nopool_error[1:15]), 14.5, mean(nopool_error[1:15]), col=rangi2)
segments(15.5, mean(nopool_error[16:30]), 29.5, mean(nopool_error[16:30]), col=rangi2)
segments(30.5, mean(nopool_error[31:45]), 44.5, mean(nopool_error[31:45]), col=rangi2)
segments(45.5, mean(nopool_error[46:60]), 59.5, mean(nopool_error[46:60]), col=rangi2)
segments(1, mean(partpool_error[1:15]), 14.5, mean(partpool_error[1:15]), lty=2)
segments(15.5, mean(partpool_error[16:30]), 29.5, mean(partpool_error[16:30]), lty=2)
segments(30.5, mean(partpool_error[31:45]), 44.5, mean(partpool_error[31:45]), lty=2)
segments(45.5, mean(partpool_error[46:60]), 59.5, mean(partpool_error[46:60]), lty=2)
#draw vertical dividers between tank densities
abline(v=15.5, lwd=0.5)
abline(v=30.5, lwd=0.5)
abline(v=45.5, lwd=0.5)
text(7.5, 0.4, "tiny(5)")
text(15+7.5, 0.4, "small(10)")
text(30+7.5, 0.4, "medium(25)")
text(45+7.5, 0.4, "large(35)")
```
plot of error of no-pooling and partial pooling estimates, for the simulated tadpole ponds
========================================================
- blue and black line segments show the average error of the no-pooling and partial pooling estimates
- both kinds of estimates much more accurate for larger ponds
- no-pool estimates have higher average error
- ponds with smallest sample size show greatest improvement over the naive no-pooling estimates
- shrinkage towards the mean results from trying to negotiate the underfitting and overfitting risks of the grand mean on one end and the individual means of each pond on the other
plot of error of no-pooling and partial pooling estimates, for the simulated tadpole ponds
========================================================
- smaller ponds contain less information, so their varying estimates are influenced more by the pooled information from the other ponds (small ponds are prone to overfitting)
- larger ponds shrink less, because they contain more information and are prone to less overfitting -> need less correcting
- partially pooled estimates are better on average; they adjust individual cluster estimates to negotiate the trade-off between underfitting and overfitting
More than one type of cluster
========================================================
- chimpanzees data
- each pull is within a cluster of pulls belonging to an individual chimpanzee
- each pull is also within an experimental block, which represents a collection of observations that happened on the same day
- each observed pull belongs to an actor and a block
- there may be unique intercepts for each actor and each block
More than one type of cluster
========================================================
$$ $$
$$ L_i \sim Binomial(1, p_i)$$
$$ logit(p_i) = \alpha + \alpha_{ACTOR[i]} + ( \beta_P + \beta_{PC}C_i)P_i$$
$$ \alpha_{ACTOR} \sim Normal(0,\sigma_{ACTOR})$$
$$ \alpha \sim Normal(0,10)$$
$$ \beta_P \sim Normal(0,10)$$
$$ \beta_{PC} \sim Normal(0,10)$$
$$ \sigma_{ACTOR} \sim HalfCauchy(0,1)$$
Model with varying intercepts on actor
========================================================
```{r, warning=F}
data(chimpanzees)
d <- chimpanzees
d$recipient <- NULL
# m12.4 <- map2stan(
# alist(
# pulled_left ~ dbinom(1, p),
# logit(p) <- a + a_actor[actor] + (bp + bpC*condition)*prosoc_left,
# a_actor[actor] ~ dnorm(0,sigma_actor),
# a ~ dnorm(0, 10),
# bp ~ dnorm(0, 10),
# bpC ~ dnorm(0, 10),
# sigma_actor ~ dcauchy(0,1)
# ),
# data = d, iter=5000, warmup = 1000, chains = 4, cores = 3
# )
load("m124.Rdata")
```
========================================================
$$ L_i \sim Binomial(1, p_i)$$
$$ logit(p_i) = \alpha + \alpha_{ACTOR[i]} + \alpha_{BLOCK[i]} + ( \beta_P + \beta_{PC}C_i)P_i$$
$$ \alpha_{ACTOR} \sim Normal(0,\sigma_{ACTOR})$$
$$ \alpha_{BLOCK} \sim Normal(0,\sigma_{BLOCK})$$
$$ \alpha \sim Normal(0,10)$$
$$ \beta_P \sim Normal(0,10)$$
$$ \beta_{PC} \sim Normal(0,10)$$
$$ \sigma_{ACTOR} \sim HalfCauchy(0,1)$$
$$ \sigma_{BLOCK} \sim HalfCauchy(0,1)$$
Model with varying intercepts on actor and block
========================================================
```{r, warning=F}
d$block_id <- d$block
# m12.5 <- map2stan(
# alist(
# pulled_left ~ dbinom(1, p),
# logit(p) <- a + a_actor[actor] + a_block[block_id] + (bp + bpC*condition)*prosoc_left,
# a_actor[actor] ~ dnorm(0,sigma_actor),
# a_block[block_id] ~ dnorm(0, sigma_block),
# a ~ dnorm(0, 10),
# bp ~ dnorm(0, 10),
# bpC ~ dnorm(0, 10),
# sigma_actor ~ dcauchy(0,1),
# sigma_block ~ dcauchy(0,10)
# ),
# data = d, iter=6000, warmup = 1000, chains = 4, cores = 3
# )
load("m125.Rdata")
```
Model with varying intercepts on actor and block
========================================================
```{r, warning=F, echo=F}
par(mfrow = c(1,2))
plot(precis(m12.5, depth = 2))
post <- extract.samples(m12.5)
dens(post$sigma_block, xlab = "sigma", xlim=c(0,4))
dens(post$sigma_actor, col = rangi2, lwd = 2, add = T)
text(2, 0.85, "actor", col = rangi2)
text(0.75, 2, "block")
```
Model with varying intercepts on actor and block
========================================================
```{r, warning=F}
compare(m12.4, m12.5)
```
Model with varying intercepts on actor and block
========================================================
- posterior distribution for sigma_block ended up close to zero
- each of the a_block parameters is strongly shrunk towards zero; they are relatively inflexible
- a_actor parameters are shunk towards zero much less, because estimated variation across actors is much larger, resulting in less shrinkage
- each of the a_actor parameters contributes much more to the pWAIC value
- difference in WAIC between the models is small; block parameters have been shrunk so much towards zero that they do very little work to the model
- whether we include block or not hardly matters
Multilevel posterior predictions
========================================================
- robust way to discover mistakes is to compare the sample to the posterior predictions of a fit model
- exploring implied posterior predictions helps to understand complex models
- another role for constructing implied predictions is in computing informations criteria, like DIC and WAIC, they provide simple estimates of out-of-sample model accuracy; provide rough measure of a model's flexibility and overfitting risk
Posterior predictions for same cluster
========================================================
- when working with same clusters as to fit a model, varying intercepts are just parameters
- trick is to ensure you use the right intercept for each case in the data (link and sim do this automatically)
- chimpanzees data: 7 unique actors (clusters), the varying intercept model, m12.4, estimated an intercept for each and two parameters to describe the mean and standard deviation of the population of actors
- construct posterior predictions, using the automated link approach and from scratch
- can't expect the posterior predictive distribution to match the raw data, even when the model worked correctly, because partial pooling shrinkes estimates towards the grand mean
computing and plotting posterior predictions for actor 2
========================================================
```{r, warning=F}
chimp <- 2
d.pred <- list(
prosoc_left = c(0,1,0,1), #right/left/right/left
condition = c(0,0,1,1), #control/control/partner/partner
actor = rep(chimp, 4)
)
link.m12.4 <- link(m12.4, data = d.pred)
```
computing and plotting posterior predictions for actor 2
========================================================
```{r, warning=F}
pred.p <- apply(link.m12.4, 2, mean)
pred.p.PI <- apply(link.m12.4, 2, PI)
pred.p
```
computing and plotting posterior predictions for actor 2
========================================================
```{r, warning=F}
#samples from the posterior: varying intercepts will be a matrix of samples
post <- extract.samples(m12.4)
#to construct posterior predictions, we build our own link function
p.link <- function(prosoc_left, condition, actor){
logodds <- with(post,
a + a_actor[, actor] + (bp + bpC * condition) * prosoc_left)
return(logistic(logodds))
}
```
computing and plotting posterior predictions for actor 2
========================================================
```{r, warning=F}
#compute predictions
prosoc_left <- c(0,1,0,1)
condition <- c(0,0,1,1)
pred.raw <- sapply(1:4, function(i) p.link(prosoc_left[i], condition[i], 2))
pred.p <- apply(pred.raw, 2, mean)
pred.p.PI <- apply(pred.raw, 2, PI)
pred.p
```
Posterior predictions for new cluster
========================================================
- we'd like to make inferences about the whole species, not the seven individuals; so individual actor intercepts aren't of interest, but the distribution of them is
- one way to grasp task of construction posterior predictions for new clusters is to imagine leaving out one cluster when fitting the model to the data
Posterior predictions for new cluster
========================================================
- for example leave out actor 7 when fitting the model
- to assess the model's accuracy for predicting actor 7's behavior we can't use any of the $a\_actor$ parameter estimates, because they apply to other individuals
- use $a$ and $sigma\_actor$ paramters, because they describe the population of actors
- construct posterior predictions for an unobserved average actor ("average" means an individual chimpanzee with an intercept at $a(\alpha)$, the population mean)
- this implies a varying intercept of zero
-since there is uncertainty about the population mean, there is still uncertainty about this average individuals's intercept
Posterior predictions for new cluster
========================================================
```{r, warning=F}
d.pred <- list(
prosoc_left = c(0,1,0,1),
condition = c(0,0,1,1),
actor = rep(2,4) #placeholder
)
a_actor_zeros <- matrix(0, 1000, 7)
link.m12.4 <- link(m12.4, n = 1000, data = d.pred, replace = list(a_actor = a_actor_zeros))
```
Posterior predictions for new cluster
========================================================
```{r, warning=F, echo=F}
pred.p.mean <- apply(link.m12.4, 2, mean)
pred.p.PI <- apply(link.m12.4, 2, PI, prob = 0.8)
plot(0, 0, type = "n", xlab = "prosoc_left/condition", ylab = "proportion pulled left", ylim = c(0,1), xaxt = "n", xlim = c(1,4))
axis(1, at = 1:4, labels = c("0/0", "1/0", "0/1", "1/1"))
lines(1:4, pred.p.mean)
shade(pred.p.PI, 1:4)
```
Posterior predictions for new cluster
========================================================
- to show variation among actors, we need to use $sigma\_actor$
- we will simulate a matrix of new varying intercepts from a Gaussian distribution defined by the adaptive prior in the model: $$\alpha_{ACTOR} \sim Normal(0, \sigma_{ACTOR})$$
- once we have samples for $\sigma_{ACTOR}$, we can simulate new actor intercepts from this distribution
```{r, warning=F}
post <- extract.samples(m12.4)
a_actor_sims <- rnorm(7000, 0, post$sigma_actor)
a_actor_sims <- matrix(a_actor_sims, 1000, 7)
link.m12.4 <- link(m12.4, n = 1000, data = d.pred, replace = list(a_actor = a_actor_sims))
```
Posterior predictions for new cluster
========================================================
```{r, warning=F, echo=F}
pred.p.mean <- apply(link.m12.4, 2, mean)
pred.p.PI <- apply(link.m12.4, 2, PI, prob = 0.8)
plot(0, 0, type = "n", xlab = "prosoc_left/condition", ylab = "proportion pulled left", ylim = c(0,1), xaxt = "n", xlim = c(1,4))
axis(1, at = 1:4, labels = c("0/0", "1/0", "0/1", "1/1"))
lines(1:4, pred.p.mean)
shade(pred.p.PI, 1:4)
```
Posterior predictions for new cluster
========================================================
- these posterior predictions are marginal of actor, which means they average over the uncertainty among actors
- plot of average actor just set the actor to the average, ignoring variation among actors
- predictions for an average actor help to visualize the impact of treatment
- predictions that are marginal of actor illustrate how variable different chimpanzees are
- plot that displays both the treatment effect and the variation among actors
- simulate a series of new actors in each of the four treatments
Posterior predictions for new cluster
========================================================
- write a new function that simulates a new actor from the estimated population of actors and then computes probabilites of pulling the left lever for each of the four treatments
- simulations will not average over uncertainty in the posterior
- get uncertainty in the plot by using multiple simulations, each with different sample from the posterior
========================================================
```{r, warning=F}
post <- extract.samples(m12.4)
sim.actor <- function(i){
sim_a_actor <- rnorm(1, 0, post$sigma_actor[i])
P <- c(0, 1, 0, 1)
C <- c(0, 0, 1, 1)
p <- logistic(
post$a[i] + sim_a_actor + (post$bp[i] + post$bpC[i]*C)*P
)
return(p)
}
```
========================================================
```{r, warning=F, echo=F}
plot(0, 0, type = "n", xlab = "prosoc_left/condition", ylab = "proportion pulled left", ylim = c(0,1), xaxt = "n", xlim = c(1,4))
axis(1, at = 1:4, labels = c("0/0", "1/0", "0/1", "1/1"))
for(i in 1:50)lines(1:4, sim.actor(i), col=col.alpha("black", 0.5))
```
Focus and multilevel prediction
========================================================
- multilevel models contain parameter with different focus
- focus means which level of the model the parameter makes direct predictions for
- parameters that describe the population of clusters, such as $\alpha$ and $\sigma_{ACTOR}$ in m12.4, do not influence prediction directly
- they are often called hyperparameters, as the are parameters for parameters
- hyperparameters had their effects during estimation, by shrinking the varying effect parameters towards a common mean
- prediction focus here is on the top level of parameters
Focus and multilevel prediction
========================================================
- prediction focus on top level of parameters, when forecasting a new observation for a cluster that was present in the sample
- for example, if we want to predict what chimpanzee 2 will do in the next experiment, we should probably bet she'll pull the left lever, because her varying intercept was very large
Focus and multilevel prediction
========================================================
- when we want to forecast for some new cluster that was not present in the sample, then we need hyperparameters
- hyperparameters tell us how to forecast a new cluster, by generating a distribution of new per-cluster intercepts
- when varying effects are used to model over-dispersion, we also need hyperparameters
- we need to simulate intercepts in order to account for the over-dispersion
Oceanic societies
========================================================
- adding a varying intercept to each society
$$ T_i \sim Poisson(\mu_i)$$
$$ logit(\mu_i) = \alpha + \alpha_{SOCIETY[i]} + \beta_PlogP_i$$
$$ \alpha \sim Normal(0,10)$$
$$ \beta_P \sim Normal(0,1)$$
$$ \alpha_{SOCIETY} \sim Normal(0,\sigma_{SOCIETY})$$
$$ \sigma_{SOCIETY} \sim HalfCauchy(0,1)$$
========================================================
- T is total_tools, P is population, i indexes each society
- varying intercept model, but with a varying intercept for each observation
- $\sigma_{SOCIETY}$ is an estimate of the over-dispersion among societies
- varying intercepts $\alpha_{SOCIETY}$ are residuals for each society
- by estimating the distribution of the residuals, we get an estimate of the excess variation, relative to the Poisson expectation
fit over-dispersion Poisson model
========================================================
```{r, warning=F}
data(Kline)
d <- Kline
d$logpop <- log(d$population)
d$society <- 1:10
# m12.6 <- map2stan(
# alist(
# total_tools ~ dpois(mu),
# log(mu) <- a + a_society[society] + bp*logpop,
# a ~ dnorm(0, 10),
# bp ~ dnorm(0, 1),
# a_society[society] ~ dnorm(0, sigma_society),
# sigma_society ~ dcauchy(0,1)
# ),
# data = d, iter=4000, chains = 3
# )
load("m126.Rdata")
```
```{r, warning=F, include=FALSE}
post <- extract.samples(m12.6)
d.pred <- list(
logpop = seq(from = 6, to = 14, length.out = 30),
society = rep(1, 30)
)
a_society_sims <- rnorm(20000, 0, post$sigma_society)
a_society_sims <- matrix(a_society_sims, 2000, 10)
link.m12.6 <- link(m12.6, n = 2000, data = d.pred, replace = list(a_society = a_society_sims))
```
fit over-dispersion Poisson model
========================================================
```{r, warning=F, echo=F}
plot(d$logpop, d$total_tools, col = rangi2, pch = 16, xlab = "log population", ylab = "total_tools")
#plot posterior mean
mu.median <- apply(link.m12.6, 2, median)
lines(d.pred$logpop, mu.median)
#plot 97%, 89%, 67% intervals
mu.PI <- apply(link.m12.6, 2, PI, prob = 0.97)
shade(mu.PI, d.pred$logpop)
mu.PI <- apply(link.m12.6, 2, PI, prob = 0.89)
shade(mu.PI, d.pred$logpop)
mu.PI <- apply(link.m12.6, 2, PI, prob = 0.67)
shade(mu.PI, d.pred$logpop)
```
========================================================
- posterior predictions for the over-dispersed Poisson model
- envelope of predictions is a lot wider than in Chapter 10
- consequence of the varying intercepts