-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d7a035c
commit c5cd72f
Showing
2 changed files
with
193 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
# 2.6 支持向量机【待完成】 | ||
|
||
支持向量机(Support Vector Machine, SVM)是一种二分类模型,其基本模型是定义在特征空间上的间隔最大的线性分类器,其学习策略便是间隔最大化。SVM 通过间隔最大化,可以使得模型对噪声更加鲁棒,从而提高模型的泛化能力。 | ||
|
||
## 定义间隔(Margin) | ||
|
||
$$ | ||
\begin{equation} | ||
h(\mathbf{x})=\left\{ | ||
\begin{aligned} | ||
+1 \quad&\text{ if }\mathbf{w}^T\mathbf{x}+b>0 \\ | ||
-1 \quad&\text{ if }\mathbf{w}^T\mathbf{x}+b<0 | ||
\end{aligned} | ||
\right. | ||
\end{equation} | ||
$$ | ||
|
||
定义距离公式 | ||
|
||
$$ | ||
\begin{align} | ||
&\text{dist}(h, \mathbf{x}_i) = \frac{\mid h(\mathbf{x}_i) \mid}{||\mathbf{w}||}\\ | ||
s.t. &\quad |\mathbf{w}^T\mathbf{x}_i + b| \geq 0 | ||
\end{align} | ||
$$ | ||
|
||
我们定义间隔 | ||
|
||
$$ | ||
\begin{align} | ||
\gamma &= \text{dist}(h, \mathbf{x}_i) \\ | ||
&=\frac{\mid h(\mathbf{x}_i) \mid}{||\mathbf{w}||} | ||
\end{align} | ||
$$ | ||
|
||
我们定义所有点都被正确分类,即 | ||
|
||
$$ | ||
\begin{equation} | ||
y_i=\left\{ | ||
\begin{aligned} | ||
+1 \quad&\text{ if }\mathbf{w}^T\mathbf{x}_i + b>0 \quad \text{分类正确}\\ | ||
-1 \quad&\text{ if }\mathbf{w}^T\mathbf{x}_i + b<0 \quad \text{分类正确}\\ | ||
+1 \quad&\text{ if }\mathbf{w}^T\mathbf{x}_i + b\leq0 \quad \text{分类错误} \\ | ||
-1 \quad&\text{ if }\mathbf{w}^T\mathbf{x}_i + b\geq0 \quad \text{分类错误} | ||
\end{aligned} | ||
\right. | ||
\end{equation} | ||
$$ | ||
|
||
即如果分类正确,$y_i(\mathbf{w}^T\mathbf{x}_i+b)>0$,如果分类错误,$y_i(\mathbf{w}^T\mathbf{x}_i +b )\leq0$。考虑 $y_i \in \{ +1, -1\}$,即乘上它只会改变 $\mathbf{w}^T\mathbf{x}_i + b$ 的正负号而不会改变其绝对值,因此如果所有点都被正确分类,我们可以把边距改写为 | ||
|
||
$$ | ||
\begin{align} | ||
{\mid \mathbf{w}^T\mathbf{x}_i + b\mid} | ||
&\rightarrow | ||
y_i(\mathbf{w}^T\mathbf{x}_i+b) > 0\\ | ||
&\Downarrow\\ | ||
\gamma_i | ||
&=\frac{\mid h(\mathbf{x}_i) \mid}{||\mathbf{w}||} \\ | ||
&=\frac{y_i(\mathbf{w}^T\mathbf{x}_i+b)}{||\mathbf{w}||} | ||
\end{align} | ||
$$ | ||
|
||
$$ | ||
\begin{align} | ||
s.t. \quad \forall (\mathbf{x}_i, y_i)\in \mathcal{D}.\quad y_i h(\mathbf{x}_i)>0 | ||
\end{align} | ||
$$ | ||
|
||
|
||
|
||
|
||
## 定义原问题 | ||
|
||
最小化距离 | ||
$$ | ||
\begin{align} | ||
& \argmax_{\mathbf{w}, b}\{\min_n \gamma_n\}\\ | ||
\text{where} | ||
&\quad \gamma_n = \frac{1}{||\mathbf{w}||}\\ | ||
s.t. | ||
&\quad \forall (\mathbf{x}_i, y_i)\in \mathcal{D}.\quad y_i h(\mathbf{x}_i)>0 | ||
\end{align} | ||
$$ | ||
|
||
|
||
|
||
如果我们定义间距的物理距离为 $1$,即对于支持向量,我们有: | ||
|
||
$$ | ||
\begin{align} | ||
\gamma_i &= \frac{| \mathbf{w}^T\mathbf{x}_i |}{\|\mathbf{w}\|} \\ | ||
&\downarrow \\ | ||
\gamma_i &= \frac{1}{\|\mathbf{w}\|} | ||
\end{align} | ||
$$ | ||
因此可以改写问题 | ||
$$ | ||
\begin{align} | ||
& \argmax_{\mathbf{w}, b}\left\{\min_n \frac{1}{||\mathbf{w}||}\right\}\\ | ||
s.t. | ||
\quad& \forall (\mathbf{x}_i, y_i)\in \mathcal{D}.\quad y_i h(\mathbf{x}_i)>0 | ||
\end{align} | ||
$$ | ||
|
||
我们对 $\mathbf{x}$ 进行非线性变换,即 $\mathbf{x}\rightarrow \phi(\mathbf{x})$,这样我们可以得到 | ||
|
||
$$ | ||
\begin{align} | ||
& \argmin_{\mathbf{w}, b}\left\{\frac{1}{2}|| \mathbf{w} ||^2\right\}\\ | ||
s.t. \quad& | ||
\forall (\mathbf{x}_i, y_i)\in \mathcal{D}.\\ | ||
&y_i(\mathbf{w}^T\phi(\mathbf{x}_i)+b)\geq1 | ||
\end{align} | ||
$$ | ||
|
||
|
||
|
||
## 改写为对偶问题 | ||
|
||
考虑原问题 | ||
|
||
$$ | ||
\begin{align} | ||
& \argmin_{\mathbf{w}, b}\left\{\frac{1}{2}|| \mathbf{w} ||^2\right\}\\ | ||
s.t. \quad& | ||
\forall (\mathbf{x}_i, y_i)\in \mathcal{D}.\\ | ||
&y_i(\mathbf{w}^T\phi(\mathbf{x}_i)+b)\geq1 | ||
\end{align} | ||
$$ | ||
|
||
|
||
我们可以定义一个惩罚函数 $g(\mathbf{w}, b)$,当违反约束时,这个惩罚会很大,而当满足约束时,这个惩罚会很小。因此在优化这个整体问题时,我们也在优化惩罚函数,即尽可能符合约束。即我们可以定义把原问题重新写为: | ||
|
||
$$ | ||
\begin{align} | ||
& \argmin_{\mathbf{w}, b}\left\{\frac{1}{2}|| \mathbf{w} ||^2 + g(\mathbf{w}, b)\right\} | ||
\end{align} | ||
$$ | ||
这样我们就可以把原问题转化为一个无约束问题。 | ||
|
||
我们首先改写约束条件: | ||
|
||
$$ | ||
\begin{align} | ||
y_i(\mathbf{w}^T\phi(\mathbf{x}_i)+b)&\geq1\\ | ||
&\Downarrow\\ | ||
y_i(\mathbf{w}^T\phi(\mathbf{x}_i)+b)-1&\geq0\\ | ||
&\Downarrow\\ | ||
1-y_i(\mathbf{w}^T\phi(\mathbf{x}_i)+b)&\leq0 | ||
\end{align} | ||
$$ | ||
|
||
这次改写使得约束条件变为了一个不等式,xxxxx | ||
|
||
|
||
我们可以定义惩罚函数为 | ||
|
||
$$ | ||
\begin{align} | ||
& g(\mathbf{w}, b) = \sum_{i=1}^N \alpha_i(1-y_i(\mathbf{w}^T\phi(\mathbf{x}_i)+b))\\ | ||
s.t. \quad& \alpha_i \geq 0 | ||
\end{align} | ||
$$ | ||
我们称 $\alpha_i$ 为拉格朗日乘子。且其: | ||
|
||
1. 当 $\mathbf{x}_i$ 不是支持向量,即 $\alpha_i = 0$,这个点对于最终的分类没有影响 | ||
2. 当 $\mathbf{x}_i$ 是支持向量,即 $\alpha_i > 0$,这个点对于最终的分类有影响 | ||
|
||
当 $\alpha_i > 0$ 时,我们称 $\mathbf{x}_i$ 为支持向量。需要注意的是尽管有可能 $\mathbf{x}_i$ 在边界上,但是其 $\alpha_i$ 仍然可能为 $0$。 | ||
|
||
我们希望使得所有违反约束的点的惩罚尽可能大,即我们希望最大化这些惩罚,因此我们可以改写惩罚函数为: | ||
|
||
$$ | ||
\begin{align} | ||
& g(\mathbf{w}, b) = \max \sum_{i=1}^N \alpha_i(1-y_i(\mathbf{w}^T\phi(\mathbf{x}_i)+b))\\ | ||
s.t. \quad& \alpha_i \geq 0 | ||
\end{align} | ||
$$ | ||
|
||
|
||
|
||
## 间隔最大化间隔问题 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters