You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Check out Tianrui Qi's Optimization
Wiki page for more details about augmented Lagrangian method (ALM), alternating
direction method of multipliers (ADMM), and other optimization algorithms.
with $\mathbf{x}_i \in \mathbb{R}^p$ and the label
$\mathbf{y}_i \in { +1, -1 }$ for each $i$, the linear support vector machine
(SVM) aims at finding a hyperplane $\mathbf{x}\mathbf{w}+b = 1$ to separate the
two classes of data points. Suppose these points can be strictly separated;
namely, there exists $\left( \mathbf{w}, b \right)$ such that
$\mathbf{x}_i\mathbf{w} +b \geq 1$ if $\mathbf{y}_i = +1$ and
$\mathbf{x}_i\mathbf{w} +b \leq 1$ if $\mathbf{y}_i = -1$, or equivalently
$\mathbf{y}_i \left( \mathbf{x}_i \mathbf{w} +b \right)\geq 1$ for all $i$. To
find a hyperplane with the maximum margin between the two classes, we solve the
following problem:
where $\left\Vert\cdot\right\Vert$ denotes the Euclidean norm (or 2-norm).
While the feasibility of the above problems depends on the separability of the
data, a soft-margin SVM is employed if the data points can not be strictly
separated by solving the problem:
Note that this is the standard problem that can be solved by ADMM. Note that the
problem can also be solved by augmented Lagrangian method (ALM), but for this
derivation, we use ADMM.
Let $\mathbf{u}\in \mathbb{R}^{N}$ be the Lagrangian multiplier, the augmented
Lagrangian function is
we get $\mathbf{W}^{\left( k+1\right) }$ by solving the linear system.
Note that
$\left( \frac{\lambda }{\beta } \mathbf{Q} +\mathbf{X}^{\top } \mathbf{X} \right)$ is
a constant, and we can compute it before the iteration start. For $\mathbf{T}$,
We stop the algorithm when maximum of primal and dual residual less than tol.
Since we do not have any sub-iteration, we set the maxit = 5000 and stop the
algorithm when the iteration is larger than maxit.
Results
We have three training datasets: rho02,
rho08, and spam.
Implementing the ADMM-based algorithm according to the derivation above, we test
our algorithm on these training datasets with different $\lambda$ by running
test_model_SVM_rho02.m,
test_model_SVM_rho08.m, and
test_model_SVM_spam.m.
Also, we run the algorithm ALM_SVM_p.p` provided by Prof.
Yangyang Xu at the same time as criteria.
Note that Prof. Xu use augmented Lagrangian method (ALM) instead of ADMM.
The results are shown below, including the violation of primal and dual
feasibility at each outer iteration, the time needed, and final accuracy.
The accuracy increase as $\lambda$ increase.
Ye, Gui–Bo, Yifei Chen, and Xiaohui Xie. "Efficient variable selection in
support vector machines via the alternating direction method of
multipliers." Proceedings of the Fourteenth International Conference onArtificial Intelligence and Statistics.
JMLR Workshop and Conference Proceedings, 2011.