From d0d7b53245477f9687363d6b58f01a3067ae00df Mon Sep 17 00:00:00 2001 From: Michael DeMarco Date: Fri, 3 Nov 2023 21:00:24 -0700 Subject: [PATCH] docs: update notes --- content/joplin/lecture 24 (2023-11-03).md | 42 +++++++++++++++++++++++ metadata.yaml | 12 +++++++ 2 files changed, 54 insertions(+) create mode 100644 content/joplin/lecture 24 (2023-11-03).md diff --git a/content/joplin/lecture 24 (2023-11-03).md b/content/joplin/lecture 24 (2023-11-03).md new file mode 100644 index 0000000000000..c068488ca7293 --- /dev/null +++ b/content/joplin/lecture 24 (2023-11-03).md @@ -0,0 +1,42 @@ +--- +title: Lecture 24 (2023-11-03) +draft: false +date: 2023-11-03T19:03:55.248Z +summary: " " +joplinId: bb4dfc128fd94c90ad529b4248269614 +backlinks: [] +--- + +# Lecture 24 + +MLE, MAP. + +(Some notes. Tablet was dead.) + +- Announcements +- TODOs + + - Make a hand-written version + +- Maximize probability p(D | w) for a data set D, parameter w +- This turns into minimizing the loss function + - We use negative log-likelihood, since multiplication (independent events) becomes addition, and minimization problem (by negative) + - Also, probabilities are super tiny, so log fixes this (i.e., no floating point issues) + - We assume errors distributed normally; with a bit of re-arrangement, we go from $p(\epsilon_i)$ to $p(y_i | x_i, w)$ + - Or, the prediction has a center and variance + - Since we're focused on $y$ given $X$, discriminative; no modelling for $X$ + - Versus naive Bayes + - If we do Laplace error, instead of Gaussian (i.e., normal), then we recover absolute error + - (Tails correspond to tolerance for outliers; Laplace error has skinnier tails) + - Transformation of "$o_i$" to a probability +- MAP + - Instead estimate p(w | D) = p(D | w)P(w) / P(D) prop P(D | w)P(w) + - Same as MLE, but now P(w) our prior; express our belief in $w$ itself, the model + - Becomes our regularizer; goes from product to sum, as expected + - Regularizer is negative log prior + - Negative log takes use from nasty probability statement to convenient L2-norm sum + - There is a choice of $\sigma$ here, unlike with MLE +- Summary +- Nod to fully-Bayesian methods +- MLE, MAP handles ordinal data, counts, survival analysis, unbalanced classes +- (This lecture is really just a probabilistic re-imagining, or justification, of ideas we've already seen) diff --git a/metadata.yaml b/metadata.yaml index 738670e7495a1..07bdfa66e81f4 100644 --- a/metadata.yaml +++ b/metadata.yaml @@ -251,3 +251,15 @@ notes: backlinks: [] id: 09d1619a4f9b4322bd6b359e308585a3 title: Learn You Some Erlang + - link: Lecture 24 (2023-11-03) + filename: lecture 24 (2023-11-03).md + folder: . + headers: + title: Lecture 24 (2023-11-03) + draft: false + date: 2023-11-03T19:03:55.248Z + summary: " " + joplinId: bb4dfc128fd94c90ad529b4248269614 + backlinks: [] + id: bb4dfc128fd94c90ad529b4248269614 + title: Lecture 24 (2023-11-03)