Merge pull request #308 from slds-lmu/forest-revisited

RF revisited
slds-lmu · Jun 10, 2024 · d885bff · d885bff
2 parents c8c6965 + 25c1dfe
commit d885bff
Show file tree

Hide file tree

Showing 7 changed files with 51 additions and 64 deletions.
diff --git a/content/chapters/07_forests/07-01-bagging.md b/content/chapters/07_forests/07-01-bagging.md
@@ -9,7 +9,7 @@ Bagging (bootstrap aggregation) is a method for combining many models into a met
 
 ### Lecture video
 
-{{< video id="hRBeeFpfMZQ" >}}
+{{< video id="S4Sa6YEXq7g" >}}
 
 ### Lecture slides
 

diff --git a/content/chapters/07_forests/07-02-basics.md b/content/chapters/07_forests/07-02-basics.md
@@ -0,0 +1,16 @@
+---
+title: "Chapter 07.02: Basics"
+weight: 7002
+quizdown: true
+---
+In this section we investigate random forests, a modification of bagging for trees.
+
+<!--more-->
+
+### Lecture video
+
+{{< video id="NY3Tux1Zt4g" >}}
+
+### Lecture slides
+
+{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-basics.pdf" >}}
diff --git a/content/chapters/07_forests/07-02-intro.md b/content/chapters/07_forests/07-02-intro.md
diff --git a/...nt/chapters/07_forests/07-03-benchmark.md → ...nt/chapters/07_forests/07-03-oob-error.md b/...nt/chapters/07_forests/07-03-benchmark.md → ...nt/chapters/07_forests/07-03-oob-error.md
@@ -1,19 +1,20 @@
 ---
-title: "Chapter 07.03: Benchmarking Trees, Forests, and Bagging k-NN"
+title: "Chapter 07.03: Out-of-Bag Error Estimate"
 weight: 7003
 quizdown: true
 ---
-We compare the performance of random forests vs. (bagged) CART and (bagged) \\(k\\)-NN.
+
+We introduce the concepts of in-bag and out-of-bag observations and explain how to compute the out-of-bag error estimate.
 
 <!--more-->
 
 ### Lecture video
 
-{{< video id="uOamholBaZ0" >}}
+{{< video id="gucPQxcqPcY" >}}
 
 ### Lecture slides
 
-{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-benchmark.pdf" >}}
+{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-oob.pdf" >}}
 
 ### Quiz
 

diff --git a/content/chapters/07_forests/07-04-featureimportance.md b/content/chapters/07_forests/07-04-featureimportance.md
@@ -3,13 +3,14 @@ title: "Chapter 07.04: Feature Importance"
 weight: 7004
 quizdown: true
 ---
+
 In a complex machine learning model, the contributions of the different features to the model performance are difficult to evaluate. The concept of feature importance allows to quantify these effects for random forests.
 
 <!--more-->
 
 ### Lecture video
 
-{{< video id="cw4qG9ePZ9Y" >}}
+{{< video id="8h3H0j2f24I" >}}
 
 ### Lecture slides
 

diff --git a/content/chapters/07_forests/07-05-proximities.md b/content/chapters/07_forests/07-05-proximities.md
@@ -3,14 +3,40 @@ title: "Chapter 07.05: Proximities"
 weight: 7005
 quizdown: true
 ---
+
 The term *proximity* refers to the "closeness" between pairs of cases. Proximities are calculated for each pair of observations and can be derived directly from random forests.
 
 <!--more-->
 
 ### Lecture video
 
-{{< video id="RGa0Uc6ZbX4" >}}
+{{< video id="8h3H0j2f24I" >}}
 
 ### Lecture slides
 
 {{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-proximities.pdf" >}}
+
+### Code demo
+
+**Random Forests**
+
+You can run the code snippets in the demos on your local machine. The corresponding Rmd version of this demo can be found [here](https://github.com/compstat-lmu/lecture_i2ml/blob/master/code-demos/code_demo_randforests.Rmd). If you want to render the Rmd files to PDF, you need the accompanying [style files](https://github.com/compstat-lmu/lecture_i2ml/tree/master/style). 
+
+{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/code-demos-pdf/code_demo_randforests.pdf" >}}
+
+### Quiz
+
+{{< quizdown >}}
+
+---
+shuffle_questions: false
+---
+
+## Which statements are true? 
+
+- [x] To compute permutation variable importance for feature $j$, we permute the feature and see how the performance changes (in OOB observations).
+- [ ] The random forest is a bad out-of-the box model and requires tuning of hyperparameters.
+- [x] Random forests and trees can be used for high-dimensional data.
+- [ ] Proximities are used in replacing missing data, but not in locating outliers.
+
+{{< /quizdown >}}
diff --git a/content/chapters/07_forests/07-06-discussion.md b/content/chapters/07_forests/07-06-discussion.md