Skip to content

Commit

Permalink
Merge pull request #308 from slds-lmu/forest-revisited
Browse files Browse the repository at this point in the history
RF revisited
  • Loading branch information
manuelhelmerichs committed Jun 10, 2024
2 parents c8c6965 + 25c1dfe commit d885bff
Show file tree
Hide file tree
Showing 7 changed files with 51 additions and 64 deletions.
2 changes: 1 addition & 1 deletion content/chapters/07_forests/07-01-bagging.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Bagging (bootstrap aggregation) is a method for combining many models into a met

### Lecture video

{{< video id="hRBeeFpfMZQ" >}}
{{< video id="S4Sa6YEXq7g" >}}

### Lecture slides

Expand Down
16 changes: 16 additions & 0 deletions content/chapters/07_forests/07-02-basics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
title: "Chapter 07.02: Basics"
weight: 7002
quizdown: true
---
In this section we investigate random forests, a modification of bagging for trees.

<!--more-->

### Lecture video

{{< video id="NY3Tux1Zt4g" >}}

### Lecture slides

{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-basics.pdf" >}}
16 changes: 0 additions & 16 deletions content/chapters/07_forests/07-02-intro.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
---
title: "Chapter 07.03: Benchmarking Trees, Forests, and Bagging k-NN"
title: "Chapter 07.03: Out-of-Bag Error Estimate"
weight: 7003
quizdown: true
---
We compare the performance of random forests vs. (bagged) CART and (bagged) \\(k\\)-NN.

We introduce the concepts of in-bag and out-of-bag observations and explain how to compute the out-of-bag error estimate.

<!--more-->

### Lecture video

{{< video id="uOamholBaZ0" >}}
{{< video id="gucPQxcqPcY" >}}

### Lecture slides

{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-benchmark.pdf" >}}
{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-oob.pdf" >}}

### Quiz

Expand Down
3 changes: 2 additions & 1 deletion content/chapters/07_forests/07-04-featureimportance.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@ title: "Chapter 07.04: Feature Importance"
weight: 7004
quizdown: true
---

In a complex machine learning model, the contributions of the different features to the model performance are difficult to evaluate. The concept of feature importance allows to quantify these effects for random forests.

<!--more-->

### Lecture video

{{< video id="cw4qG9ePZ9Y" >}}
{{< video id="8h3H0j2f24I" >}}

### Lecture slides

Expand Down
28 changes: 27 additions & 1 deletion content/chapters/07_forests/07-05-proximities.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,40 @@ title: "Chapter 07.05: Proximities"
weight: 7005
quizdown: true
---

The term *proximity* refers to the "closeness" between pairs of cases. Proximities are calculated for each pair of observations and can be derived directly from random forests.

<!--more-->

### Lecture video

{{< video id="RGa0Uc6ZbX4" >}}
{{< video id="8h3H0j2f24I" >}}

### Lecture slides

{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/slides-pdf/slides-forests-proximities.pdf" >}}

### Code demo

**Random Forests**

You can run the code snippets in the demos on your local machine. The corresponding Rmd version of this demo can be found [here](https://github.com/compstat-lmu/lecture_i2ml/blob/master/code-demos/code_demo_randforests.Rmd). If you want to render the Rmd files to PDF, you need the accompanying [style files](https://github.com/compstat-lmu/lecture_i2ml/tree/master/style).

{{< pdfjs file="https://github.com/slds-lmu/lecture_i2ml/tree/master/code-demos-pdf/code_demo_randforests.pdf" >}}

### Quiz

{{< quizdown >}}

---
shuffle_questions: false
---

## Which statements are true?

- [x] To compute permutation variable importance for feature $j$, we permute the feature and see how the performance changes (in OOB observations).
- [ ] The random forest is a bad out-of-the box model and requires tuning of hyperparameters.
- [x] Random forests and trees can be used for high-dimensional data.
- [ ] Proximities are used in replacing missing data, but not in locating outliers.

{{< /quizdown >}}
41 changes: 0 additions & 41 deletions content/chapters/07_forests/07-06-discussion.md

This file was deleted.

0 comments on commit d885bff

Please sign in to comment.