This repository has been archived by the owner on Sep 30, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 41
/
2018-05-02 Big Entropy and the Generalized Linear Model.Rpres
391 lines (303 loc) · 13.3 KB
/
2018-05-02 Big Entropy and the Generalized Linear Model.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
Big Entropy and the Generalized Linear Model
========================================================
author: Christina Kastner
date: 2-5-2018
autosize: true
Table of contents
========================================================
- Introduction
- Maximum entropy
- Gaussian
- Binomial
- Generalized linear models
- Meet the family
- Exponential Distribution
- Gamma Distribution
- Poisson Distribution
- Link Functions
- Logit Link
- Log Link
- Summary
Introduction
========================================================
People share the experience of fighting with tangled electrical cords.
Why tend cables towards tying themselves in knots?
Descritive Level: Entropy
<div style="text-align:center;"><img src="figures/kabel.jpg"; width=350 height=500 pos=>
Introduction - Entropy
========================================================
Entropy helps solving problems with choosing distributions
Conventional choices are not always the best choices
(eg. wide Gaussian priors, Gaussian likelihood of linear regression)
Reasons for betting on distributions with the biggest entropy:
- Widest and least informative distribution
- Nature tends to produce empirical distributions that have high entropy
- It tends to work
Introduction - Generalized Linear Model
========================================================
- Much like linear regressions
- Model that replaces a parameter of a likelihood function with a linear model
- Maximum entropy helps to choose likelihood functions
Maximum Entropy
========================================================
We seek measure of uncertainty that satisfies:
- The measure should be continuous
- It should increase as the number of possible events increases
- It should be additive
## Information Entropy:
<span style="color:red">
$$\Huge H(p) = -\sum_ip_i\log{(p_i)}$$
Maximum Entropy
========================================================
<span style="color:green">
The distribution that can happen the most ways is also the distribution with the biggest information entropy.
The distribution with the biggest entropy
is the most conservative distribution that obeys its constraints.
<div style="text-align:center;"><img src="figures/BigEntropy.png"; width=550 height=550 pos=>
Maximum Entropy
========================================================
```{r}
p <- list()
p$A <- c(0,0,10,0,0)
p$B <- c(0,1,8,1,0)
p$C <- c(0,2,6,2,0)
p$D <- c(1,2,4,2,1)
p$E <- c(2,2,2,2,2)
# Normalize each such that it is a probability distribution
p_norm <- lapply( p , function(q) q/sum(q))
# Compute information entropy
H <- sapply( p_norm , function(q) -sum(ifelse(q==0,0,q*log(q))))
H
```
Maximum Entropy
========================================================
<div style="text-align:center;"><img src="figures/logways.png";width=300 height=300 pos=>>
```{r}
# log ways per pebble
ways <- c(1,90,1260,37800,113400)
logwayspp <- log(ways)/10
logwayspp
```
Maximum Entropy
========================================================
- Information entropy & log(ways) per pebble contain the same information
- Information Entropy: Ways of counting how many unique arrangements correspond to a distribution
- Most plausible distribution: distribution that happen the greatest number of ways -> Maximum Entropy Distribution
- The large majority of unique arrangements produce either the maximum entropy distribution or a distribution similar to it
<span style="color:red">
Bet on maximum entropy: "center of gravity for the highly pausible distribtions"
<div style="text-align:center;"><img src="figures/logways.png";width=150 height=150>
Maximum Entropy
========================================================
Derivation Maximum Entropy (Page 271)
Maximum Entropy with prior information qi:
$$\huge \frac{1}{N}log{Pr(n_1,...,n_m)} = -\sum_ip_i\log{(p_i/q_i)}$$
Maximum Entropy - Gaussian
========================================================
## Generalized normal distribution:
<div style="text-align:center;"><img src="figures/GnVtlg.png";width=60 height=60>
We want to compare a regular Gaussian distribtution with variance $\huge\sigma^2$ to several generalized normal with the same variance.
<div style="text-align:center;"><img src="figures/Normalverteilung.png";width=280 height=280>
(Proof that the Gaussian has the largest entropy of any distribution with a given variance Page 274)
Maximum Entropy - Binomial
========================================================
## Binomial distribution:
<img src="figures/Binomial.png";width=60 height=60>
## More elementary view:
<img src="figures/BinomialS.png";width=60 height=60>
We want to show that Binomial distribution has the largest entropy of any distribution that satisfies these constraints:
- Only two unordered events
- Constant expect value
Maximum Entropy - Binomial
========================================================
## Example 1
- We have a bag with unknown number of blue and white marbles and draw 2 marbles with replacement
- 4 Events: ww, wb, bw, bb
- Expected value: 1 blue marble
- A: Binomial distribution with n = 2, p = 0.5
<div style="text-align:center;";>
<img src="figures/Tab_B.png";width=110 height=110>
<div style="text-align:center;";>
<img src="figures/GraphB.png";width=280 height=280>
Maximum Entropy - Binomial
========================================================
## Example 1
```{r}
# build list of the candidate distributions
p <- list()
p[[1]] <- c(1/4,1/4,1/4,1/4)
p[[2]] <- c(2/6,1/6,1/6,2/6)
p[[3]] <- c(1/6,2/6,2/6,1/6)
p[[4]] <- c(1/8,4/8,2/8,1/8)
# compute expected value of each
sapply(p, function(p) sum(p*c(0,1,1,2)))
# compute entropy of each distribution
sapply(p, function(p) -sum( p*log(p)))
```
Maximum Entropy - Binomial
========================================================
## Example 2
- We have a bag with unknown number of blue and white marbles and draw 2 marbles with replacement
- 4 Events: ww, wb, bw, bb
- Expected value: 1.4 blue marble
- A: Binomial distribution with n = 2, p = 0.7
```{r}
p <- 0.7
A <- c((1-p)^2 , p*(1-p) , (1-p)*p , p^2)
A
# Calculate Entropy
-sum(A*log(A))
```
Maximum Entropy - Binomial
========================================================
Formula (1): $$\huge 0*\frac{x_1}{\sum_{i=1}^4{x_i}} + 1*\frac{x_2}{\sum_{i=1}^4{x_i}} + 1*\frac{x_3}{\sum_{i=1}^4{x_i}} + 2*\frac{x_4}{\sum_{i=1}^4{x_i}} = 1.4$$
```{r}
library(rethinking)
sim.p <- function(G=1.4) {
x123 <- runif(3)
# switching the formula (1) to x4 yields:
x4 <- ((G)*sum(x123)-x123[2]-x123[3])/(2-G)
z <- sum( c(x123,x4) )
# Normalize values x1 to x4 to get a probability distribution
p <- c( x123 , x4 )/z
list( H=-sum( p*log(p) ) , p=p )
}
H <- replicate(1e5, sim.p(1.4))
# dens(as.numeric(H[1,]),adj=0.1)
```
Maximum Entropy - Binomial
========================================================
<div style="text-align:center;";>
<img src="figures/Example2.png";width=220 height=220>
```{r}
entropies <- as.numeric(H[1,])
distributions <- H[2,]
max(entropies)# Entropy binomial distribution: 1.221729
distributions[which.max(entropies)]# binomial d.: 0.09, 0.21, 0.21, 0.49
```
Maximum Entropy
========================================================
- Two un-ordered outcomes are possible and the expected numbers are assumed to be constant -> binomial distribution
- Gaussian distribution most conservative distribution for continous outcome and finite variance
- Chapter 2: Binomial distribution: counting how many paths through garden of forking data were consistent with assumption
- Entropy does the same -> Entropy is counting
- Page 280: It is shown that the binomial distribution is a maximum entropy distribution
Generalized linear models
========================================================
<span style="color:red">
## Linear model:
<img src="figures/lm.png">
<span style="color:black">
- Not the best choice if outcome is discrete or bounded
- For example: drawing marbles
## Generalized linear model:
<span style="color:black">
- Use prior knowledge about outcome
- Use maximum entropy for choice of distribution
- Replace a parameter that describe the shape of the likelihood with a linear model
<img src="figures/binGLM.png">
Generalized linear models
========================================================
Difference to linear model:
- Different likelihood
- We have to use a link function
Binomial distribution:
- Shape described by 2 parameters (n und p; mean = np)
- n usually known -> attach linear model to p
- p probability mass
<div style="text-align:center;";>
<img src="figures/solid.png";width=275 height=275>
Generalized linear models - Meet the family
========================================================
- Most common distributions used in statistical modelling: exponential family
- Every member has maximum entropy
<div style="text-align:center;";>
<img src="figures/family.png";width=500 height=500>
Generalized linear models - Exponential Distribution
========================================================
<img src="figures/exp.png";>
- Constrained to be zero or positive
- Distribution of distance and duration, kinds of measurement that represent displacement from some point of reference either in time or space
- If probability of an event is constant in time or across space then the distribution of events tends towards exponential
- Maximum entropy among all non-negative continuous distributions
- Shape described by a single parameter
- Distribution is the core of survival and event history analysis
Generalized linear models - Gamma Distribution
========================================================
<img src="figures/Gamma.png";>
- Constrained to be zero or positive
- Distribution of distance and duration
- Peak can be above 0
- If an event can only happen after two or more exponentially distributed events happen the resulting waiting times will be gamma distributed
- Maximum entropy among all distributions with the same mean and same average logarithm
- Shape described by 2 parameter
- Common in survival and event history analysis as well as some contexts in which a continuous measurement is constrained to be positive
Generalized linear models - Poisson Distribution
========================================================
<img src="figures/pois.png";>
- Count distribution
- Special case of binomial (n large, p small -> poisson)
- Used for counts that never get close to any theoretical maximum
- As special case of binomial: maximum entropy
- Shape described by one parameter
Generalized linear models - Link Function
========================================================
<img src="figures/binGLM.png">
- Used to build a regression model
- Tries to avoid:
- negative distances
- probability masses that exceed 1
- Most commonly used link functions:
- logit link
- log link
Generalized linear models - Logit Link
========================================================
- Maps a parameter that is defined as probability mass onto a linear model
- Extreme commom with binomial GLM
<div style="text-align:center;";>
<img src="figures/logit.png">
<div style="text-align:center;";>
<img src="figures/logit2.png">
<div style="text-align:center;";>
<img src="figures/logit3.png">
<div style="text-align:center;";>
<img src="figures/logit4.png">
Generalized linear models - Logit Link
========================================================
<div style="text-align:center;";>
<img src="figures/logit4.png">
- Usually called logistic function or inverse-logit
<div style="text-align:center;";>
<img src="figures/logodds.png">
- Compression affects the interpretation of parameter estimates: a unit change in the predictor variable does no longer produce a constant change in the mean of the outcome
Generalized linear models - Log Link
========================================================
- Maps a paramater which is defined over positive real values
<div style="text-align:center;";>
<img src="figures/normalGLM.png">
<div style="text-align:center;";>
<img src="figures/sigmaglm.png">
<div style="text-align:center;";>
<img src="figures/loglink.png">
- Cannot predict values outside the range of data used to fit the model
Generalized linear models
========================================================
## Absolute and relative differences:
- Parameter estimates doesn`t tell by themselves the importance of a predictor outcome
- Big beta coefficient may not correspond to a big effect on the outcome
## GLM`s & information criteria:
- Only compare models with AIC/WIC/WAIC if models use same type of likelihood
- Maximum entropy helps to make an easy choice of likelihood
## Maximum entropy priors:
- Maximum entropy helps to choose an outcome distribution
- When we have background information about a parameter: maximum entropy provides a way to generate a prior that embodies background information while assuming as little else as possible
Summary
========================================================
- Maximum entropy provides a successful way to choose likelihood functions
- Information entropy: measure of the number of way a distribution can arise according to assumtions
- If we choose the distribution with the biggest information entropy we choose a distribution that obeys the constraints on outcome variables
- GLM arise naturally from this approach as extensions of LM
- When using a GLM we have to choose a link function to bind the linear model to the generalized outcome
<div style="text-align:center;";>
<span style="color:red">
Thank you for your attention!