Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: constrained optimization #50

Closed
wants to merge 29 commits into from
Closed

WIP: constrained optimization #50

wants to merge 29 commits into from

Conversation

timholy
Copy link
Contributor

@timholy timholy commented Mar 11, 2014

This is not done yet, but I thought I should put it out there so people can comment sooner rather than later, if it looks like this is going in the wrong direction.

When finished, this should implement all the standard inequality constraints; equality constraints may come, but likely will be a little later (I have a pressing need for the inequalities, but not the equalities). The only algorithm I've implemented is a central-path-following interior-point solver---but I'd venture to say that if you could only pick one, interior-point is the way to go.

Using this, a version of nnls (just two lines, objective = linlsq(A, b); result = interior(objective, initial_x, bounds, method=:newton)) is 100x faster on a 50x50 problem compared to the existing implementation of nnls in Optim.

Left to do:

  • Add support for constraints to other linesearch algorithms
  • Implement value, gradient, and hessian for barrier functions of linear and nonlinear constraints
  • Add more tests

In addition to adding support for constrained optimization, this makes several relevant improvements to hz_linesearch, in particular ones that are smarter about some of the odd behaviors of floating-point numbers, and others that will make constrained optimization perform significantly better. Thanks to @JeffBezanson for his quick work on JuliaLang/julia#6097, making it feasible to debug some of the darker corners of hz_linesearch.

One final point: note that this changes how the problems directory is handled. I changed it because it wasn't obvious to me that we wanted this to be a regular part of loading Optim. Perhaps we should go one step further, and only load it when running tests (and not even have testpaths in Optim.jl).

@mlubin
Copy link
Contributor

mlubin commented Mar 11, 2014

Nice! How difficult would it be to support sparse Hessians and constraint Jacobians? We could then pretty easily use this as the first pure-Julia backend for JuMP.

@timholy
Copy link
Contributor Author

timholy commented Mar 11, 2014

As you might notice, I began a process of switching declarations over to AbstractMatrix and AbstractVector. So it should not be hard. The actual fill-in will of course be up to the user, just as the user currently has to write the objective function.

For the constraints, my thinking is that the user writes the functions so they add to an existing gradient/Hessian (one computed by the objective function), rather than having the algorithm provide a blank vector/matrix and then copy the results into the combined gradient/Hessian. (See the beginning of interior.jl.) That's precisely so we can better support sparsity (all that copying would destroy any advantage of sparsity).

@mlubin
Copy link
Contributor

mlubin commented Mar 11, 2014

The typical interface for these sorts of solvers it to provide a callback to compute the Hessian of the Lagrangian directly, where you provide the weights to the users. Check out http://ipoptjl.readthedocs.org/en/latest/ipopt.html.

We're soon going to be developing a standardized interface for this in MathProgBase, as we have for linear programming.

@johnmyleswhite
Copy link
Contributor

This seems like a great start. I don't see anything that troubles me in a quick review. Will do a more detailed pass through tomorrow morning.

@timholy
Copy link
Contributor Author

timholy commented Mar 11, 2014

I can see passing Lagrange multipliers/coefficients to the user---that's indeed probably a better design than my attempt to hide it from the user. But there are some other things I don't understand (or don't like):

  • Why do you need to store the gradient of each constraint separately? (I.e., why do you even need the full Jacobian?) At least for interior-point problems without equality constraints, the only thing you need is the summed gradient of both the objective and the constraints. But perhaps other solvers benefit from having the full Jacobian? In which case we'd better plan for this.

  • Why should the user have to write a single function that incorporates all the constraints together with the objective? It seems to encourage better design to have the solver make the calls that accumulate each term separately, i.e.:

    objective_gradient!(x, g)                # This initializes g and fills it with values
    constraintA_gradient!(x, g, lambda1)     # Adds to g, doesn't replace g
    constraintB_gradient!(x, g, lambda2)
    ...
    

    The key advantage is that once I implement the functions for constraintA, I can re-use it again without modification for other problems, simply by saying something like

    add_nlineq!(constraints, DifferentiableFunction(constraintA_objective, constraintA_gradient!))
    

    In many fields there are a small handful of standard penalty functions or constraints, which can be used in conjunction with many different objective functions.

  • It seems inconsistent to provide the weights for computing the Hessian but not the Jacobian

In your case, I recognize that you simply have to live with the design choices make by Ipopt, but I'm not sure we should mimic it entirely.

@mlubin
Copy link
Contributor

mlubin commented Mar 11, 2014

Different interior-point approaches do require the full Jacobian. If there are equality constraints, the KKT system is typically formulated as

[ H A^T ]
[ A  0  ]

where A is the Jacobian of the equality constraints (might have a transpose incorrect here). For sparse problems it's reasonable to formulate the KKT conditions using the explicit inequalities instead of eliminating them from the system. I'm mostly familiar with primal-dual interior-point methods, so things might be slightly different with primal methods. Anyway it doesn't make too much sense to have a "summing" interface to provide inequality constraints and another interface for equality constraints. The interface should also try to be uniform over different algorithms.

It's reasonable to have an interface where the terms are summed separately. This is more difficult if you have a sparse hessian, since you can't easily add to a sparse matrix (but the trick here is that solvers let you provide the matrix in triplet format, and duplicate terms are summed together). I don't have any objections to providing a convenient interface like this, as long as there's a way to write a small wrapper to transform Ipopt-style input to use this code.

Note that almost all high-performance nonlinear solvers out there (like KNITRO, MOSEK) have an interface that looks like Ipopt's. They're certainly not user friendly, though they're meant to be used from modeling systems like AMPL, GAMS, and now JuMP that can automatically provide sparse derivatives.

@timholy
Copy link
Contributor Author

timholy commented Mar 11, 2014

I did specify in my (edited) comment problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles. Suppose you add linear equality constraints; for such constraints, there's no need to ask the user to supply anything other than the matrix and rhs---all the derivatives can all be handled internally, so again I don't see the need to trouble the user for the full Jacobian. Where one has to think about it is for nonlinear equality constraints, but my suspicion is that there one would be going with an augmented Lagrangian anyway, making it a little different problem. I'll think about it more, however.

The solver here "looks" primal but it exploits duality---eps_gap is the duality gap. One advantage of this formulation (which is straight out of Hindi's tutorial, based in turn on Boyd & Vandenberghe) is that you can use it with methods that don't require the Hessian (cg, etc). That's already implemented, in fact. I was quite surprised that even for a nnls problem, where you have the whole matrix, cg was only ~10x slower than Newton's method.

@@ -138,6 +138,18 @@ function bfgs{T}(d::Union(DifferentiableFunction,
f_x_previous, f_x = f_x, d.fg!(x, gr)
f_calls, g_calls = f_calls + 1, g_calls + 1

x_converged,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does moving the convergence check here change the inputs? If so, were the previous inputs not correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was that there were other exit points from the function (https://github.com/JuliaOpt/Optim.jl/blob/master/src/bfgs.jl#L149). So it was indicating it hadn't converged, but that test often gets triggered because it did converge.

@johnmyleswhite
Copy link
Contributor

Having read this more carefully, one question I have is why we should have a linlsq function in Optim. What's the use case for it?

@tkelman
Copy link

tkelman commented Mar 11, 2014

problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles

Most of the serious large-scale solvers use a row-wise interface with a lower and an upper bound for each constraint, one of which can be +/- inf. Equality constraints have lb[i] = ub[i]. From https://projects.coin-or.org/Ipopt/wiki

   min     f(x)
x in R^n

s.t.       g_L <= g(x) <= g_U
           x_L <=  x   <= x_U

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

The KKT system has the Jacobians of both equality and inequality constraints in it, in the sparse non-convex case you don't want to eliminate anything ahead of time, the sparse symmetric indefinite linear solver will do a better job at reducing fill-in. The zero block in the lower right isn't usually actually zero, an inertia perturbation is required to guarantee convergence properties.

When some constraint rows are linear, you could technically leave those nonzeros in the Jacobian untouched between interations and save a few copies, but the savings there are minimal when compared to the KKT factorization cost.

@timholy
Copy link
Contributor Author

timholy commented Mar 11, 2014

one question I have is why we should have a linlsq function in Optim

This creates an objective function for least squares, but then you can supplement this with constraints. Using this, nonnegative least-squares becomes a few lines:

objective = linlsq(A, b)
bounds = ConstraintsBox(zeros(eltype(x), length(x)), nothing)
results = interior(objective, initial_x, bounds, method=:newton)

As I mentioned above, I tested this on a particular 50x50 problem and found this version to be roughly 100x faster than the nnls I wrote initially and is currently in Optim.

@mlubin
Copy link
Contributor

mlubin commented Mar 11, 2014

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

@johnmyleswhite
Copy link
Contributor

Building nnls in three lines of code is a really great example for Optim. But I guess I think nnls belongs in a separate package, since it's more of a use case for Optim than part of Optim per se.

@timholy
Copy link
Contributor Author

timholy commented Mar 12, 2014

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

Got it, I hadn't thought through the implications of having two bounds on each (non)linear constraint carefully enough. That will work well for a Newton method, but I have to think through the implications for implementation with CG (which for my problems is the much more important algorithm)---it seems like unifying inequalities and equalities could result in troublesome behavior with respect to roundoff errors.

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

But I guess I think nnls belongs in a separate package

OK, these can go.

Since this PR may take some redesign and I have deadlines, this may sit for a while.

@johnmyleswhite
Copy link
Contributor

Ok. I'll be excited to see where this heads whenever you pick it up again.

@timholy
Copy link
Contributor Author

timholy commented Oct 21, 2015

I'm back to being interested in this branch again. Reference: https://groups.google.com/forum/#!topic/julia-opt/TVmuXFWfeBM. In contrast, here I see no overhead from the solver, and the progress towards the minimum per iteration also seems better. I haven't yet measured it per function evaluation; that's likely to look a little less favorable, because I noticed that our linesearch not infrequently uses more evaluations than Ipopt. (There's some evidence that might be a good thing (p. 2164), but it seems likely to reduce the speed of optimization.)

I kinda wanted to get out of the business of writing my own optimization code, and maybe I still will if someone comes up with another solution. But I thought I'd at least ping this issue to say it may not be dead yet. Another thing about Optim that I like is that I can use Float32s (my 100GB disk file would be a 200GB disk file had I used Float64s), and I worry a little about the minimizer detecting "edges" among adjacent Float32 values if it's using Float64s as coordinates for the optimization.

Since I've just been playing with this, here are a couple of random API thoughts:

  • With barrier methods, you have a combined objective. With show_trace=true, it's nice to show just the part that corresponds to the user's objective. The architecture here doesn't yet make that easy, because what the "outer" optimizer sees is the total function penalty (user-objective + barrier terms).
  • The API here isn't yet as flexible as MathProgBase. However, I note that it takes substantially fewer "user" lines of code. (Making it more flexible might reduce that advantage, of course.)

@mlubin
Copy link
Contributor

mlubin commented Oct 22, 2015

@timholy, I'm open to suggestions like JuliaOpt/MathProgBase.jl#87 on how to make the MPB API easier to both use and implement on the solver side.

This should cover cases where dphi < 0 "by a hair," i.e., where the search direction is almost orthogonal to the gradient. It's an extra layer of security beyond commit 50b9d60 to increase the likelihood that when we terminate, we really have reached a minimum.
@timholy
Copy link
Contributor Author

timholy commented Oct 22, 2015

Other than what I suggested already, I can't think of any other ways to improve on what you've done---at the end of the day you have to be able to interact with C code, and the API you've already designed is quite good for that. I just wanted to throw in a tweak or two.

But of course new ideas may crop up in the course of implementation.

@@ -89,7 +89,7 @@ macro cgtrace()
dt["g(x)"] = copy(gr)
dt["Current step size"] = alpha
end
grnorm = norm(gr, Inf)
grnorm = norm(gr[:], Inf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use vecnorm(gr, Inf) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

Thanks for the comment!

@timholy
Copy link
Contributor Author

timholy commented Jan 26, 2016

I am under the illusion that I'll soon have a few free days to work on Optim, and might want to put this into shape for merger. Any API thoughts about this PR would be welcome; I think I'll need to make quite a few changes.

@mlubin
Copy link
Contributor

mlubin commented Jan 26, 2016

@timholy, I'd say choose the API that seems most natural to you, so long as it's possible to write a lightweight wrapper to take in MathProgBase input as well. There's no need for the native API to the solver to be exactly MathProgBase.

@timholy
Copy link
Contributor Author

timholy commented Jan 26, 2016

Aside from the integration into MathProgBase, what I wonder most about is the proper API for specifying constraints. Ipopt, for example, does not appear to have a special category for linear constraints, but I assume that would be a useful thing to specialize on. Likewise, #50 (comment) suggests two possible APIs for specifying equality or inequality constraints. As I think about it, I lean towards making everything an equality constraint and then specifying bounds on the slack variables, but any comments would be welcome.

From the user's perspective, the good news is that no matter how crappy an API I design, it will still look pretty when accessed via the remarkable JuMP 😄. But that's probably not something to shoot for.

@tkelman
Copy link

tkelman commented Jan 27, 2016

If you want the API to be appropriate for problems that may have general nonlinear equality constraints, then my personal favorite canonical form (equalities and bounds) hasn't changed over the last 2 years. Conversions between canonical forms are simple so the user-facing API can use something more general like the row-wise form even if the algorithm internally uses something different. For other classes of problems you may want the internal implementation to use a different form (and/or API) though.

In the Ipopt algorithm, linear constraints can at best save you a subset of Jacobian row evaluations, and those are typically not the bottleneck relative to the solution of the KKT system for the Newton step. You can set an option flag if all constraints happen to be linear then the Jacobian will only be evaluated once. Whether it's worth the API complication of allowing situations in between, either via a linearity bitmask or separate inputs for linear vs nonlinear constraints (Matlab fmincon style), depends mostly on whether you expect Jacobian evaluations to be expensive.

…ot eps_gap

eps_gap introduced an absolute scale for convergence, which seemed problematic. This should be more robust, as it monitors the characteristics of the solution. It should also avoid making changes to the barrier that no longer have a consequence for the solution; previous approaches introduced numerical stability problems when the barrier penalty became extreme.
Also cleans up a bit of the tracing & convergence-testing
@timholy
Copy link
Contributor Author

timholy commented Nov 8, 2016

Don't delete this branch just yet, though. People may be using it.

@pkofod
Copy link
Member

pkofod commented Nov 8, 2016

Well, we could let it stay forever. It could be used as a "masterclass in git-fu"... I know I've tried to rebase this quite a few times, only to give up halfway through!

@timholy
Copy link
Contributor Author

timholy commented Nov 9, 2016

Probably a good idea. Certainly people in my lab are using it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants