From d0d7b53245477f9687363d6b58f01a3067ae00df Mon Sep 17 00:00:00 2001
From: Michael DeMarco <michaelfromyeg@gmail.com>
Date: Fri, 3 Nov 2023 21:00:24 -0700
Subject: [PATCH] docs: update notes

---
 content/joplin/lecture 24 (2023-11-03).md | 42 +++++++++++++++++++++++
 metadata.yaml                             | 12 +++++++
 2 files changed, 54 insertions(+)
 create mode 100644 content/joplin/lecture 24 (2023-11-03).md

diff --git a/content/joplin/lecture 24 (2023-11-03).md b/content/joplin/lecture 24 (2023-11-03).md
new file mode 100644
index 0000000000000..c068488ca7293
--- /dev/null
+++ b/content/joplin/lecture 24 (2023-11-03).md	
@@ -0,0 +1,42 @@
+---
+title: Lecture 24 (2023-11-03)
+draft: false
+date: 2023-11-03T19:03:55.248Z
+summary: " "
+joplinId: bb4dfc128fd94c90ad529b4248269614
+backlinks: []
+---
+
+# Lecture 24
+
+MLE, MAP.
+
+(Some notes. Tablet was dead.)
+
+- Announcements
+- TODOs
+
+  - Make a hand-written version
+
+- Maximize probability p(D | w) for a data set D, parameter w
+- This turns into minimizing the loss function
+  - We use negative log-likelihood, since multiplication (independent events) becomes addition, and minimization problem (by negative)
+  - Also, probabilities are super tiny, so log fixes this (i.e., no floating point issues)
+  - We assume errors distributed normally; with a bit of re-arrangement, we go from $p(\epsilon_i)$ to $p(y_i | x_i, w)$
+    - Or, the prediction has a center and variance
+  - Since we're focused on $y$ given $X$, discriminative; no modelling for $X$
+  - Versus naive Bayes
+  - If we do Laplace error, instead of Gaussian (i.e., normal), then we recover absolute error
+    - (Tails correspond to tolerance for outliers; Laplace error has skinnier tails)
+  - Transformation of "$o_i$" to a probability
+- MAP
+  - Instead estimate p(w | D) = p(D | w)P(w) / P(D) prop P(D | w)P(w)
+  - Same as MLE, but now P(w) our prior; express our belief in $w$ itself, the model
+  - Becomes our regularizer; goes from product to sum, as expected
+    - Regularizer is negative log prior
+    - Negative log takes use from nasty probability statement to convenient L2-norm sum
+  - There is a choice of $\sigma$ here, unlike with MLE
+- Summary
+- Nod to fully-Bayesian methods
+- MLE, MAP handles ordinal data, counts, survival analysis, unbalanced classes
+- (This lecture is really just a probabilistic re-imagining, or justification, of ideas we've already seen)
diff --git a/metadata.yaml b/metadata.yaml
index 738670e7495a1..07bdfa66e81f4 100644
--- a/metadata.yaml
+++ b/metadata.yaml
@@ -251,3 +251,15 @@ notes:
       backlinks: []
     id: 09d1619a4f9b4322bd6b359e308585a3
     title: Learn You Some Erlang
+  - link: Lecture 24 (2023-11-03)
+    filename: lecture 24 (2023-11-03).md
+    folder: .
+    headers:
+      title: Lecture 24 (2023-11-03)
+      draft: false
+      date: 2023-11-03T19:03:55.248Z
+      summary: " "
+      joplinId: bb4dfc128fd94c90ad529b4248269614
+      backlinks: []
+    id: bb4dfc128fd94c90ad529b4248269614
+    title: Lecture 24 (2023-11-03)