NathanGodey
diff --git a/‎imgs/course4/bfloat.png
472 KB b/‎imgs/course4/bfloat.png
472 KB
diff --git a/‎imgs/course4/flashattn_banner.png
186 KB b/‎imgs/course4/flashattn_banner.png
186 KB
diff --git a/‎imgs/course4/quantization.png
25 KB b/‎imgs/course4/quantization.png
25 KB
diff --git a/‎imgs/course4/tensor_parallel.png
70.1 KB b/‎imgs/course4/tensor_parallel.png
70.1 KB
diff --git a/‎markdown/course3_lm.md
Lines changed: 1 addition & 1 deletion b/‎markdown/course3_lm.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎markdown/course4_efficiency.md
Lines changed: 21 additions & 2 deletions b/‎markdown/course4_efficiency.md
Lines changed: 21 additions & 2 deletions
@@ -477,7 +477,7 @@ $$
 ---
 ### Decoders - Inference speed
 * For greedy decoding without prefix:
-  * $n$ passes with sequences of length $n$
+  * $n$ passes with sequences of length $1\leq t \leq n$
   * Each pass is $O(n^2)$
   * Complexity: $O(n^3)$
 * Other decoding are <ins>more costly</ins>
 
@@ -86,14 +86,24 @@ $$
     * `float16`: reduces memory usage, good with V100-gen GPUs
     * `bfloat16`: more stability, but only usable with A100-gen GPUs
 
+---
+### Training LMs - (b)float16
+<br>
+<center><img width="1000px" src="../imgs/course4/bfloat.png"></center>
+
+---
+### Training LMs - Efficient implementations
+- FlashAttention (Dao et al. 2022)
+<center><img width="1000px" src="../imgs/course4/flashattn_banner.jpeg"/></center>
+
 ---
 ### Training LMs - Efficient implementations
 - FlashAttention (Dao et al. 2022)
 <center><img width="1000px" src="../imgs/course4/flashattn_banner.jpeg"/></center>
 
 ---
 ### Training LMs - Efficient implementations
-- FlashAttention2 (Dao et al. 2023)
+- FlashAttention 2 & 3 (Dao et al. 2023)
 <center><img width="600px" src="../imgs/course4/flash2.png"/></center>
 
 ---
@@ -158,6 +168,10 @@ $$
 ### Training LMs - FSDP
 <center><img width="1000px" src="../imgs/course4/fsdp.png"/></center>
 
+---
+### Training LMs - FSDP
+<center><img width="1000px" src="../imgs/course4/tensor_parallel.png"/></center>
+
 ---
 ### Training LMs - DeepSpeed
 - Similar to FSDP:
@@ -210,6 +224,11 @@ $$ Q_{i_4}(0.3)  \neq 0$$
 
 ---
 
+### Quantization
+<center><img width="800px" src="../imgs/course4/quantization.png"/></center>
+
+---
+
 ### LM quantization
 - GPTQ (Frantar et al. 2023)
 <center><img width="900px" src="../imgs/course4/gptq.png"/></center>
@@ -285,7 +304,7 @@ where $W$ is a weight matrix to quantize into $\hat{W}$, and $X$ are data points
 
 ---
 
-### Sheared Llama (Xia et al. 2023)
+### Pruning - Sheared Llama (Xia et al. 2023)
 * Remove weights that minimize loss increase <center><img width="1000px" src="../imgs/course4/sheared_llama.png"/></center>
 * Continue the pretraining of the obtained reduced model