From 456260a38d169decf5fb711b7f2fa70750f6f85a Mon Sep 17 00:00:00 2001
From: Alexandr Katrutsa <aleksandr.katrutsa@phystech.edu>
Date: Mon, 2 Jul 2018 10:53:51 +0300
Subject: [PATCH] Add en version of seminar 21, #15

---
 21-Penalty/Seminar21en.ipynb | 501 +++++++++++++++++++++++++++++++++++
 1 file changed, 501 insertions(+)
 create mode 100644 21-Penalty/Seminar21en.ipynb

diff --git a/21-Penalty/Seminar21en.ipynb b/21-Penalty/Seminar21en.ipynb
new file mode 100644
index 0000000..20a95bc
--- /dev/null
+++ b/21-Penalty/Seminar21en.ipynb
@@ -0,0 +1,501 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Seminar 21\n",
+    "# Penalty method and augmented Lagrangian method"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Reminder\n",
+    "\n",
+    "- Interrior point method\n",
+    "- Barrier method with logarithmic barrier\n",
+    "- Primal-dual method"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Penalty method\n",
+    "\n",
+    "**Idea:** barriers prevent iterands to move outside of the feasible set, and penalty functions leads to significantly increasing of the objective function in the case of lying iterands outside of the feasible set. However, this scenario is possible."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Equality constrained problem\n",
+    "\n",
+    "\\begin{equation*}\n",
+    "\\begin{split}\n",
+    "& \\min f(x)\\\\\n",
+    "\\text{s.t. } & g_i(x) = 0, \\; i=1,\\ldots,m\n",
+    "\\end{split}\n",
+    "\\end{equation*}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Quadratic penalty method\n",
+    "Original problem can be reformulated as unconstrained optimization problem \n",
+    "\n",
+    "$$\n",
+    "\\min_x Q(x, \\mu),\n",
+    "$$\n",
+    "\n",
+    "where \n",
+    "\n",
+    "$$\n",
+    "Q(x, \\mu) = f(x) + \\frac{\\mu}{2}\\sum\\limits_{i=1}^mg^2_i(x), \\quad \\mu > 0\n",
+    "$$\n",
+    "\n",
+    "- If constraints is not satisfied, then objective function increases proportionally to $\\mu$\n",
+    "- While the value of $\\mu$ increases, solution of the unconstrained problem is satisfied constraints more and more accurately"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### General scheme\n",
+    "\n",
+    "```python\n",
+    "def QudraticPenaltyEquality(Q, gradQ, x0, get_mu, get_tau, **kwargs):\n",
+    "    while True:\n",
+    "        # Stop when norm of gradient of Q is less than current tau\n",
+    "        x = MinimizeQ(Q, gradQ, x0, get_tau)\n",
+    "        if global_cnvergence(x, Q, **kwargs):\n",
+    "            break\n",
+    "        mu = get_mu()\n",
+    "        Q, gradQ = UpdateQ(mu)\n",
+    "        x0 = UpdateStartPoint(x, Q)\n",
+    "    return x\n",
+    "```\n",
+    "\n",
+    "- Parameter $\\mu$ has to be changed depending on the complexity opf solveing subproblem: if it is known that subproblem is solved long time, then $\\mu$ has to be changed more smoothly, for example $\\mu_{k+1} = 2\\mu_k$. \n",
+    "If subproblem is solved fast, then more aggresive changing of $\\mu$ is acceptable, e.g. $\\mu_{k+1} = 15\\mu_k$.\n",
+    "- While increasing $\\mu$, solving of the subproblem is more and more difficult due to poor conditioning of the hessian. More details below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Convergence\n",
+    "\n",
+    "**Theorem 1.** Assume that for every $\\mu$ unconstrained optimization problem has finite global solution. Then the sequence of the subproblems solutions converges to global solution of the original problem while $\\mu \\to \\infty$\n",
+    "\n",
+    "- Issue: global optimum for some subproblems can be unreachable!\n",
+    "\n",
+    "**Theorem 2.** Assume $\\tau_k \\to 0$ and $\\mu_k \\to \\infty$ and $\\| Q'(x^*_k, \\mu_k) \\| \\leq \\tau_k$. Then \n",
+    "- if $x^*_k \\to x^*$ and $x^*$ is infeasible, then $x^*$ is stationary point of the function $\\| g(x) \\|^2_2$;\n",
+    "- if $x^*$ is feasible and gradients of the constraints in this point are linearly independent, then $x^*$ is the point in which KKT conditions are satisfied.\n",
+    "\n",
+    "For any subsequence $x^*_k \\to x^*, \\; k \\in \\mathcal{C}$ the following holds\n",
+    "\n",
+    "$$\n",
+    "\\lim_{k \\in \\mathcal{C}} \\mu_k g_i(x_k) = \\lambda_i^*\n",
+    "$$\n",
+    "\n",
+    "for all $i = 1,\\ldots,m$, where $\\lambda_i^*$ is Lagrange multipliers that satisfied KKT."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Singularity of hessian \n",
+    "\n",
+    "**Example.**\n",
+    "\\begin{equation*}\n",
+    "\\begin{split}\n",
+    "& \\min -5x_1^2 + x_2^2\\\\\n",
+    "\\text{s.t. }& x_1 = 1\n",
+    "\\end{split}\n",
+    "\\end{equation*}\n",
+    "Penalty function\n",
+    "$$\n",
+    "Q(x, \\mu) = -5x_1^2 + x_2^2 + \\frac{\\mu}{2}(x_1 - 1)^2\n",
+    "$$\n",
+    "**Note** While $\\mu < 10$ function $Q(x, \\mu)$ is unbounded below for $x$ and subproblem does not have finite solution "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### General form of hessian\n",
+    "\n",
+    "$$\n",
+    "Q''(x, \\mu) = f''(x) + \\mu\\sum_{i=1}^m g_i(x) g''_i(x) + \\mu(g'(x))^{\\top} g'(x),\n",
+    "$$\n",
+    "\n",
+    "where $g'(x)$ is Jacobian of the vector-function representing equality constraints.\n",
+    "\n",
+    "From theorem 2, the following approximation holds near the optimum point\n",
+    "\n",
+    "$$\n",
+    "Q''(x, \\mu) \\approx L''(x, \\lambda^*) + \\mu(g'(x))^{\\top} g'(x)\n",
+    "$$\n",
+    "\n",
+    "- hessian of Lagrangian does not depend on $\\mu$\n",
+    "- product $(g'(x))^{\\top} g'(x)$ has rank of $m \\ll n$\n",
+    "\n",
+    "**Summary:** some of the eigenvalues of hessian $Q''(x, \\mu)$ are of the order $\\mu$, therefore increasing $\\mu$ leads to singularity of hessian\n",
+    "\n",
+    "**Corollary:** search direction in the Newton method is very inaccurate!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### How to find direction in Mewton method?\n",
+    "\n",
+    "$$\n",
+    "Q''(x_k, \\mu)p = -Q'(x_k, \\mu)\n",
+    "$$\n",
+    "\n",
+    "Let us introduce new variable $\\xi = \\mu g'(x) p$ and re-write the system in the form\n",
+    "\n",
+    "$$\n",
+    "\\begin{bmatrix}\n",
+    "f''(x) + \\mu \\sum\\limits_{i=1}^m g_i(x)g_i''(x) & g'(x) \\\\\n",
+    "(g'(x))^{\\top} & -\\frac{1}{\\mu} I\n",
+    "\\end{bmatrix}\n",
+    "\\begin{bmatrix}\n",
+    "p\\\\\n",
+    "\\xi\n",
+    "\\end{bmatrix}\n",
+    "= \n",
+    "\\begin{bmatrix}\n",
+    "-Q'(x, \\mu)\\\\\n",
+    "0\n",
+    "\\end{bmatrix}\n",
+    "$$\n",
+    "\n",
+    "- Condition numbner of this system is no longer increased while increasing $\\mu$\n",
+    "- Dimension is increased by $m$\n",
+    "- Quadratic approximation can still be inadequate"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Equality and inequality constraimed problem \n",
+    "\n",
+    "\\begin{equation*}\n",
+    "\\begin{split}\n",
+    "& \\min_x f(x)\\\\\n",
+    "\\text{s.t. } & g_i(x) = 0, \\; i=1,\\ldots,m \\\\\n",
+    "& h_j(x) \\leq 0, \\; j = 1,\\ldots,p\n",
+    "\\end{split}\n",
+    "\\end{equation*}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Penalty function\n",
+    "\n",
+    "$$\n",
+    "\\min_x f(x) + \\frac{\\mu}{2}\\sum\\limits_{i=1}^mg^2_i(x) + \\frac{\\mu}{2}\\sum\\limits_{j=1}^p (\\max(0, h_j(x)))^2,\n",
+    "$$\n",
+    "where $\\mu > 0$\n",
+    "- Difference from the equality constrained problem: second derivative of penalty function is discontinuous and consequently penalty function is no longer twice continuously differentiable"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Pro & Contra\n",
+    "\n",
+    "Pro\n",
+    "\n",
+    "- automatic way to convert any *constrained* optimizatiomn problem to *unconstrained* one\n",
+    "- you don't need to find initial guess\n",
+    "- somitemes small violation of constraints are acceptable\n",
+    "- general scheme is simple to implement\n",
+    "\n",
+    "Contra\n",
+    "\n",
+    "- solution of the unconstrained optimization problen is not always solutiuon of the constrained original problem\n",
+    "- if objective function of the original problem is defined only on the original feasible set, then penalty method is inapplicable\n",
+    "- quadratic approximation $Q(x, \\mu)$ may be inadequate"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Augmented Lagrangian method\n",
+    "**Motivation:** in the penalty method solutions of the subproblems can violate constraints and it is known only that\n",
+    "\n",
+    "$$\n",
+    "g_i(x^*_k) \\approx \\frac{\\lambda^*}{\\mu_k} \\to 0, \\quad \\mu_k \\to \\infty\n",
+    "$$\n",
+    "\n",
+    "Is it possible to fix $Q(x, \\mu)$ in a way to prevent constraints violation?\n",
+    "\n",
+    "**Idea:** add penalty not to the objective function, but to the Lagrangian. It is analogue of the primal-dual method, because one iteration performs updating both primal and dual variables."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Equality constrained optimization problem\n",
+    "\n",
+    "\\begin{equation*}\n",
+    "\\begin{split}\n",
+    "& \\min f(x)\\\\\n",
+    "\\text{s.t. } & g_i(x) = 0, \\; i=1,\\ldots,m\n",
+    "\\end{split}\n",
+    "\\end{equation*}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Augmented Lagrangian\n",
+    "$$\n",
+    "M(x, \\lambda, \\mu) = f(x) + \\sum\\limits_{i=1}^m\\lambda_i g_i(x) + \\frac{\\mu}{2}\\sum\\limits_{i=1}^mg^2_i(x)\n",
+    "$$\n",
+    "\n",
+    "Necessary optimality condition for $M(x_k, \\lambda^k, \\mu_k)$\n",
+    "$$\n",
+    "f'(x_k) + \\sum\\limits_{i=1}^m (\\lambda^k_i + \\mu_k g_i(x_k) ) g'_i(x_k) \\approx 0\n",
+    "$$\n",
+    "From this follows expression for $\\lambda^{k+1}$\n",
+    "$$\n",
+    "\\lambda^{k+1}_i = \\lambda^k_i + \\mu_k g_i(x_k)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Constraints violation\n",
+    "\n",
+    "$$\n",
+    "g_i(x_k) \\approx \\frac{\\lambda^*_i - \\lambda^k_i}{\\mu_k} \\to 0\n",
+    "$$\n",
+    "\n",
+    "- In the penalty method convergence of equality constraints to zero was of the order of $1/\\mu_k$\n",
+    "- In the augmented Lagrangian method convergence to zero is faster, because not only denominator increases, but nominator also converges to zero"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Equality and inequality constraints problem\n",
+    "\\begin{equation*}\n",
+    "\\begin{split}\n",
+    "& \\min_x f(x)\\\\\n",
+    "\\text{s.t. } & g_i(x) = 0, \\; i=1,\\ldots,m \\\\\n",
+    "& h_j(x) \\leq 0, \\; j = 1,\\ldots,p\n",
+    "\\end{split}\n",
+    "\\end{equation*}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Augmented Lagrangian \n",
+    "\n",
+    "$$\n",
+    "M(x, \\mu, \\lambda, \\nu) = f(x) + \\sum\\limits_{i=1}^m \\left[\\lambda_ig_i(x) + \\frac{\\mu}{2}g^2_i(x)\\right] + \\frac{1}{2\\mu}\\sum\\limits_{j=1}^p\\left[ (\\nu_j + \\mu h_j(x))_+^2 - \\nu^2_j \\right],\n",
+    "$$\n",
+    "where $\\lambda$ are vector of dual variables corresponding to **equality** constraints, $\\nu$ are vector of dual variables corresponding to **inequality** constraints\n",
+    "- See the derivation of $M(x, \\mu, \\lambda, \\nu)$ in lecture notes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Updating of dual variables corresponding to inequalities constraints\n",
+    "$$\n",
+    "\\nu^{k+1}_j = (\\nu^k_j + \\mu_k h_j(x_k))_+\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "### Convergence\n",
+    "\n",
+    "- Convergence is local (initial guess is near optimum in primal and dual variables)\n",
+    "- For some sufficiently large $\\mu$, solutions of the subproblems will converge to the solution of the original problem, therefore one should not increase $\\mu$ infinitely\n",
+    "- Solving subploblems is easy than in the penalty method because $\\mu$ is bounded above"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Practical remarks\n",
+    "\n",
+    "- Augmented Lagrangian method is preferable because solving of subproblems is as difficult as in the penalty method, but its convergence is faster\n",
+    "- Penalty method is preferable as regularization of the sequential quadratic programing approach ([SQP](https://en.wikipedia.org/wiki/Sequential_quadratic_programming))\n",
+    "- Packages [LACELOT](http://www.numerical.rl.ac.uk/lancelot/blurb.html) and [MINOS](https://en.wikipedia.org/wiki/MINOS_(optimization_software)) use different implementations of the augmented Lagrangian method\n",
+    "- The [paper](https://www.dropbox.com/s/rdqcn0fppumreto/MinosvsLancelot.pdf?dl=0) presents comparison of these packages, but they were developed in early 1990s using Fortran"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## Recap\n",
+    "\n",
+    "- Penalty method\n",
+    "- Augmented Lagrangian method"
+   ]
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Slideshow",
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}