Lecture 13 Neural Network + Backpropagation

Activation Functions

Sigmoid / Logistic Function
- $\frac{1}{1 + \exp{(-\alpha)})}$
Tanh
- Like logistic function but shifted to range $[-1, +1]$
reLU often used in vision tasks
- rectified linear unit
- Linear with cutoff at zero
- $max(0, wx+b)$
- Soft version: $\log{(\exp{(x)} + 1)}$

Quadratic Loss
- the same objective as Linear Regression
- i.e. MSE
Cross Entropy
- the same objective as Logistic Regression
- i.e. negative log likelihood
- this requires probabilities, so we add an additional "softmax" layer at the end of our network
- steeper

	Forward	Backward
Quadratic	$J = 1/2 (y - y^*)^2$	$\frac{dJ}{dy} = y - y^*$
Cross Entropy	$J = y^\log{(y)} + (1-y^)\log{(1-y)}$	$\frac{dJ}{dy} = \frac{y^}{y} + \frac{(1-y^)}{y-1}$

Def #1 Chain Rule
- $y = f(u)$
- $u = g(x)$
- $\frac{dy}{dx} = \frac{dy}{du}·\frac{du}{dx}$
Def #2 Chain Rule
- $y = f(u_1,u_2)$
- $u_2 = g_2(x)$
- $u_1 = g_1(x)$
- $\frac{dy}{dx} = \frac{dy}{du_1}·\frac{du_1}{dx} + \frac{dy}{du_2}·\frac{du_2}{dx}$
Def #3 Chain Rule
- $y = f(u)$
- $u = g(x)$
- $\frac{dy}{dx} = \sum_{j=1}^J \frac{dy_i}{du_j}·\frac{du_j}{dx_k}, \forall i,k$
- Backpropagation is just repeated application of the chain rule
Computation Graphs
- not a Neural Network diagram