-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
36492de
commit 7f83ad1
Showing
5 changed files
with
127 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# 层(Layer) | ||
|
||
层的概念其实在很早的逻辑回归我们就以接触。逻辑回归中我们将数据先经过线性组合得新的空间,再通过sigmoid函数将这个空间映射到概率空间。如果我们将其分成两部分,则可以认为其本质上是一层线性层后跟上一个sigmoid层。 | ||
|
||
而现代深度神经网络通过将不同组件模块化实现类似的功能。 | ||
例如对于一组数据,其每个元素为 MxN 的矩阵,而我们期望将其输入到 MLP,并期望神经网络能将其分类为3类。 | ||
|
||
我们可以将其先拍平成一个 $M\times N$ 维的向量并将其输入进 MLP。而这个拍平操作也被表示为一个层,通常为 `nn.Flatten`。 | ||
|
||
而MLP则可以将其看作一个函数,其输入为 $M\times N$,其输出为 3 个标量表示不同分类的权重。 | ||
|
||
最后将结果放入一个 Softmax 层,得到不同分类的概率。 | ||
|
||
```mermaid | ||
graph LR | ||
D[数据] --MxN的矩阵--> F[Flatten] | ||
F --MN维的向量--> MLP[MLP] | ||
MLP --3个标签权重--> SM[Softmax] | ||
SM --> O1[label1 的概率] | ||
SM --> O2[label2 的概率] | ||
SM --> O3[label3 的概率] | ||
``` | ||
|
||
使用 Torch 代码表示则为 | ||
|
||
```python | ||
import torch.nn as nn | ||
|
||
M = 28 | ||
N = 28 | ||
|
||
|
||
model = nn.Sequential( | ||
# Flatten matrix to vector | ||
nn.Flatten(), | ||
|
||
# MLP | ||
nn.Linear(M * N, 128), | ||
nn.ReLU(), | ||
nn.Linear(128, 32), | ||
nn.ReLU(), | ||
nn.Linear(32, 3), | ||
|
||
# Softmax | ||
nn.Softmax(dim=1) | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# 多分类与Softmax | ||
|
||
## 动机 | ||
|
||
当我们面对多分类问题时,我们通常需要获得样本对于多个标签的概率,即 $P(\text{label} \mid \mathbf{x})$ 并通过比对不同标签的概率获得最终标签。 | ||
|
||
对于二分类问题是比较容易解决的,我们可以让神经网络输出一个标量,其取值为正标签的概率,以此我们来判断标签类别。 | ||
|
||
而对于多分类上述技巧就显得黔驴技穷了。 | ||
|
||
## 解决方案 | ||
|
||
这时候我们就可以让神经网络对于每个标签各输出一个值并比对这些值来获得最可能的分类,即 | ||
|
||
$$ | ||
\hat{y}=\argmax_{o_l} o_L | ||
$$ | ||
|
||
如果我们想通过这几个值获取其概率,聪明的你一定想到可以将其总和加起来并计算对应权重,即: | ||
|
||
另对于标签 $l' \in L$ 的输出值为 $o_{l'}$,则: | ||
|
||
$$ | ||
P(l' \mid x) = \frac{o_{l'}}{\sum_{l \in L} o_l} | ||
$$ | ||
|
||
分母是所有标签的输出值总和,而分子则是标签的输出值。 | ||
|
||
这个方法很棒,但是在实践中却存在问题:缺失梯度。即我们无法通过这个方法来训练神经网络。 | ||
|
||
## Softmax | ||
|
||
为了解决上述问题,我们引入了Softmax函数,其定义如下: | ||
|
||
$$ | ||
P(l' \mid x) = \frac{\exp(o_{l'})}{\sum_{l \in L} \exp(o_l)} | ||
$$ | ||
|
||
其中 $\exp$ 是指数函数,即 $\exp(x) = e^x$。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
import torch.nn as nn | ||
|
||
M = 28 | ||
N = 28 | ||
|
||
|
||
model = nn.Sequential( | ||
# Flatten matrix to vector | ||
nn.Flatten(), | ||
|
||
# MLP | ||
nn.Linear(M * N, 128), | ||
nn.ReLU(), | ||
nn.Linear(128, 32), | ||
nn.ReLU(), | ||
nn.Linear(32, 3), | ||
|
||
# Softmax | ||
nn.Softmax(dim=1) | ||
) |