forked from jackyzha0/quartz
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4c757bb
commit d0d7b53
Showing
2 changed files
with
54 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
title: Lecture 24 (2023-11-03) | ||
draft: false | ||
date: 2023-11-03T19:03:55.248Z | ||
summary: " " | ||
joplinId: bb4dfc128fd94c90ad529b4248269614 | ||
backlinks: [] | ||
--- | ||
|
||
# Lecture 24 | ||
|
||
MLE, MAP. | ||
|
||
(Some notes. Tablet was dead.) | ||
|
||
- Announcements | ||
- TODOs | ||
|
||
- Make a hand-written version | ||
|
||
- Maximize probability p(D | w) for a data set D, parameter w | ||
- This turns into minimizing the loss function | ||
- We use negative log-likelihood, since multiplication (independent events) becomes addition, and minimization problem (by negative) | ||
- Also, probabilities are super tiny, so log fixes this (i.e., no floating point issues) | ||
- We assume errors distributed normally; with a bit of re-arrangement, we go from $p(\epsilon_i)$ to $p(y_i | x_i, w)$ | ||
- Or, the prediction has a center and variance | ||
- Since we're focused on $y$ given $X$, discriminative; no modelling for $X$ | ||
- Versus naive Bayes | ||
- If we do Laplace error, instead of Gaussian (i.e., normal), then we recover absolute error | ||
- (Tails correspond to tolerance for outliers; Laplace error has skinnier tails) | ||
- Transformation of "$o_i$" to a probability | ||
- MAP | ||
- Instead estimate p(w | D) = p(D | w)P(w) / P(D) prop P(D | w)P(w) | ||
- Same as MLE, but now P(w) our prior; express our belief in $w$ itself, the model | ||
- Becomes our regularizer; goes from product to sum, as expected | ||
- Regularizer is negative log prior | ||
- Negative log takes use from nasty probability statement to convenient L2-norm sum | ||
- There is a choice of $\sigma$ here, unlike with MLE | ||
- Summary | ||
- Nod to fully-Bayesian methods | ||
- MLE, MAP handles ordinal data, counts, survival analysis, unbalanced classes | ||
- (This lecture is really just a probabilistic re-imagining, or justification, of ideas we've already seen) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters