Skip to content

Commit 25f1fa5

Browse files
author
Nathan Godey
committed
2024 update
1 parent c2618c1 commit 25f1fa5

File tree

6 files changed

+22
-3
lines changed

6 files changed

+22
-3
lines changed

imgs/course4/bfloat.png

472 KB
Loading

imgs/course4/flashattn_banner.png

186 KB
Loading

imgs/course4/quantization.png

25 KB
Loading

imgs/course4/tensor_parallel.png

70.1 KB
Loading

markdown/course3_lm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -477,7 +477,7 @@ $$
477477
---
478478
### Decoders - Inference speed
479479
* For greedy decoding without prefix:
480-
* $n$ passes with sequences of length $n$
480+
* $n$ passes with sequences of length $1\leq t \leq n$
481481
* Each pass is $O(n^2)$
482482
* Complexity: $O(n^3)$
483483
* Other decoding are <ins>more costly</ins>

markdown/course4_efficiency.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,14 +86,24 @@ $$
8686
* `float16`: reduces memory usage, good with V100-gen GPUs
8787
* `bfloat16`: more stability, but only usable with A100-gen GPUs
8888

89+
---
90+
### Training LMs - (b)float16
91+
<br>
92+
<center><img width="1000px" src="../imgs/course4/bfloat.png"></center>
93+
94+
---
95+
### Training LMs - Efficient implementations
96+
- FlashAttention (Dao et al. 2022)
97+
<center><img width="1000px" src="../imgs/course4/flashattn_banner.jpeg"/></center>
98+
8999
---
90100
### Training LMs - Efficient implementations
91101
- FlashAttention (Dao et al. 2022)
92102
<center><img width="1000px" src="../imgs/course4/flashattn_banner.jpeg"/></center>
93103

94104
---
95105
### Training LMs - Efficient implementations
96-
- FlashAttention2 (Dao et al. 2023)
106+
- FlashAttention 2 & 3 (Dao et al. 2023)
97107
<center><img width="600px" src="../imgs/course4/flash2.png"/></center>
98108

99109
---
@@ -158,6 +168,10 @@ $$
158168
### Training LMs - FSDP
159169
<center><img width="1000px" src="../imgs/course4/fsdp.png"/></center>
160170

171+
---
172+
### Training LMs - FSDP
173+
<center><img width="1000px" src="../imgs/course4/tensor_parallel.png"/></center>
174+
161175
---
162176
### Training LMs - DeepSpeed
163177
- Similar to FSDP:
@@ -210,6 +224,11 @@ $$ Q_{i_4}(0.3) \neq 0$$
210224

211225
---
212226

227+
### Quantization
228+
<center><img width="800px" src="../imgs/course4/quantization.png"/></center>
229+
230+
---
231+
213232
### LM quantization
214233
- GPTQ (Frantar et al. 2023)
215234
<center><img width="900px" src="../imgs/course4/gptq.png"/></center>
@@ -285,7 +304,7 @@ where $W$ is a weight matrix to quantize into $\hat{W}$, and $X$ are data points
285304

286305
---
287306

288-
### Sheared Llama (Xia et al. 2023)
307+
### Pruning - Sheared Llama (Xia et al. 2023)
289308
* Remove weights that minimize loss increase <center><img width="1000px" src="../imgs/course4/sheared_llama.png"/></center>
290309
* Continue the pretraining of the obtained reduced model
291310

0 commit comments

Comments
 (0)