-
Notifications
You must be signed in to change notification settings - Fork 22
fix the derivative of the projection on exponential cones #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @AxelBreuer, thanks for making this PR. It's great this makes the tests pass. Can you point to the math that motivates this change? It's been a while since I've looked at the math (and this code). But I think I'm looking at pages 23-24 of https://arxiv.org/pdf/1705.00772.pdf. With the caveat that I haven't looked at this code or math in a long time, I believe If that's correct, then I think (one) of the fixes is to just set
For all of my points, I might be totally wrong. But it would be great if someone could double check ... |
Hi Akshay, thanks for your quick reply ! To be honest I haven't read https://arxiv.org/pdf/1705.00772.pdf as I worked with the formulas found in https://arxiv.org/pdf/1811.02157.pdf. In fact, thanks to your reference, I understand better now the naming convention [r,s,t] found in in the C++ code (the naming convention in my source article was [x,y,z]). It is funny that you mention the, more elegant, solution of using The most confusing part in the existing C++ code, is that vector Apologies, but I do not understand your notation (S.4) and (S.6) hence I do not know what you are referring to (an equation in an article ? an equality in the C++ code ?). Could you give me some more explanations ? |
Oh, interesting! Yes, you're right, the two should be similar (mathematically, they should be the same). I wonder if increasing EXP_CONE_MAX_ITERS in https://github.com/cvxgrp/diffcp/blob/master/cpp/src/cones.cpp would change anything ...
Sorry about that. Actually I was totally mistaken in my additional question, there is no sign error, the coefficients are fine, so never mind that comment. |
No, I did not, until you asked :-) Below are the values for the test case #57: x_pert-x = [-2.93045271e-07 -2.07199327e-07 5.00255416e-07] rs[2] - x_i[2] = 0.707107 | t = 1 -> So, in this particular test case,
Below the same test case as above, but this time with EXP_CONE_MAX_ITERS = 2000 (instead of 200 previously): x_pert-x = [-2.93045271e-07 -2.07199327e-07 5.00255416e-07] rs[2] - x_i[2] = 0.707107 | t = 1 -> So, in this particular test case, EXP_CONE_MAX_ITERS does not improve things. |
Regarding the two last observations:
and
maybe this indicates that there is still a living bug in Do you have an article explaining the details of the 1D-Newton-based algorithm behind Never the less, would it be possible for you to validate my pull request for now ? or is this potential existence of a bug in the 1D-Newton based implementation blocking ? |
A few remarks concerning the technical details of the algorithm behind I found some interesting details in https://github.com/matbesancon/MathOptSetDistances.jl/blob/master/src/projections.jl:
Furthermore, I read on the recent release note of scs https://github.com/cvxgrp/scs/releases/tag/3.2.3
I wanted to share these sources, as they could be useful for an eventual re-coding of |
It looks like |
@bodono thanks for your helpful feedback ! By any chance, do you remember why the returned value of (Btw, I have noticed that |
By any chance, do you remember why the returned value of rho, a.k.a t, is
not the lagrange multipier ? is rho supposed to be the lagrange multiplier
at all ?
Which function do you mean? I'm pretty sure it is finding *a* dual
variable, but the question is which dual variable.
(Btw, I have noticed that _proj_exp_cone() in original-scs has one more
input argument w than _proj_exp_cone() in diffcp: the original-scs code was
copied and modified, probably by setting w=0 : maybe that is the crux)
Probably it was just forked from an older version without `w`. I think `w`
is just a warm-start (guess) of the solution, which shouldn't affect
correctness (if you're talking about `exp_newton_one_d`, I don't see a `w`
in `_proj_exp_cone`).
By the way, if you want to test if the old projection is matching the new
one (which is totally different), you can simply update this [test](
https://github.com/cvxgrp/scs/blob/master/test/problems/test_exp_cone.h#L51)
to add new test points and see if the outputs match the output of diffcp.
To run the test simply `To test, run 'make test' and then
'$(OUT)/run_tests_direct'`.
…On Thu, May 4, 2023 at 4:05 PM Axel Breuer ***@***.***> wrote:
@bodono <https://github.com/bodono> thanks for your helpful feedback !
By any chance, do you remember why the returned value of t is not the
lagrange multipier ? is t supposed to be the lagrange multiplier at all ?
I am curious, because I did some extra printing:
- First, the iterations in exp_get_rho_ub() for the initial lower and
upper boundary:
exp_calc_grad(v, x, *lb) = 1.92585| exp_calc_grad(v, x, *ub) = 1.44511
exp_calc_grad(v, x, *lb) = 1.44511| exp_calc_grad(v, x, *ub) = 0.817791
exp_calc_grad(v, x, *lb) = 0.817791| exp_calc_grad(v, x, *ub) = 2.62944e-07
exp_calc_grad(v, x, *lb) = 2.62944e-07| exp_calc_grad(v, x, *ub) = -1.24797
As we start with lb=0 and ub = 0.125, at the end of the 4th iteration, we
have lb=1 and lu=2.
- Second, the iterations on i in _proj_exp_cone():
lb = 1 | ub = 2 | rho =1.5
lb = 1 | ub = 1.5 | rho =1.25
lb = 1 | ub = 1.25 | rho =1.125
lb = 1 | ub = 1.125 | rho =1.0625
lb = 1 | ub = 1.0625 | rho =1.03125
lb = 1 | ub = 1.03125 | rho =1.01562
lb = 1 | ub = 1.01562 | rho =1.00781
lb = 1 | ub = 1.00781 | rho =1.00391
lb = 1 | ub = 1.00391 | rho =1.00195
lb = 1 | ub = 1.00195 | rho =1.00098
lb = 1 | ub = 1.00098 | rho =1.00049
lb = 1 | ub = 1.00049 | rho =1.00024
lb = 1 | ub = 1.00024 | rho =1.00012
lb = 1 | ub = 1.00012 | rho =1.00006
lb = 1 | ub = 1.00006 | rho =1.00003
lb = 1 | ub = 1.00003 | rho =1.00002
lb = 1 | ub = 1.00002 | rho =1.00001
lb = 1 | ub = 1.00001 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
lb = 1 | ub = 1 | rho =1
which shows that lb remains equal to 1 but ub converges to 1: hence
rho=(lb+ub)/2 converges to 1.
This numerical behavior seems a bit odd, as we know that the gradient for
rho=1 is worth 2.62944e-07 (see the first print out when the algo
initialized ul and ub) but not 0.0. I would hence have expected rho = 1 -
epsilon.
It is precisely for that expectation, that I was wondering if there is a
bug in _proj_exp_cone()
—
Reply to this email directly, view it on GitHub
<#59 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEHQP3J5KGFPOIRVEOCZBTXEPAR3ANCNFSM6AAAAAAXURQDNE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Brendan O'Donoghue
http://bodono.github.io/
|
Excellent question ! In https://web.stanford.edu/~boyd/papers/pdf/cone_prog_refine.pdf equation (13), the projection on the exponential cone is expressed as a least squares problem with 1 equality constraint. The associated lagrangian formulation is written just above equation (25). The lagrangian multiplier I am talking about since the beginning is that mu^star. I thought that rho returned by But I might be completly wrong ! (which would be an excellent news for my pr, and a bad news for my rusty math skills) |
Yes, I think that's what Shane and I thought at the time ... we could have been wrong. @AxelBreuer , I suspect your change is correct ... Thanks for looking into this so deeply. Would it be possible for you to add a test to https://github.com/cvxgrp/diffcp/blob/master/tests/test_cone_prog_diff.py that exercises your change? That is, a test that fails without your change, and passes with it. |
Great ! I will add the test tomorrow: here in Paris it's the end of day. |
Couldn't wait: I have added the test |
@bodono Out of curiosity, by any chance, do you know which constraint exactly |
The calculation of the gradient here shows that the cone has been reformulated to the relative-entropy cone in order to do the projection: In other words, rather than the constraint in the projection being |
Enligthening answer Brendan ! Thank you very much. |
Hi,
I found our discussion on the exponential cone projection so interesting that I had to write a note which sums up all details of the numerical implementaion.
It is a quick note I made for myself but I sought you could find it useful (who knows, maybe another user is intrigued by your projection algorithm and you could just forward my note).
Best,
[projection.pdf](https://github.com/cvxgrp/diffcp/files/11408670/projection.pdf)
Axel
|
Yes, it's been great to follow along! I can't see the attachment here on github, can you try uploading it again? |
just did (I re-edited my previous message), I hope it worked this time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks so much @AxelBreuer
@AxelBreuer and @bodono , thanks so much for helping us fix this bug. It's been open for far too long. The change looks good to me! |
Oh nice, thanks for writing this up! Yes, that's basically the algorithm I implemented. The only thing I would add is that the the dual variable is bisection on the gradient of a concave function, the solution is guaranteed to exist and bisection is guaranteed to work once we have upper and lower bounds. I should say that I tried a lot of different approaches, many different equivalent reformulations of the cone and many, many different Newton methods (2d, 3d, 4d etc). This approach was the most robust by far, even if it was slower than the Newton methods when they worked. I only recently updated it to the Friberg algorithm, which appears to be faster and at least as robust. |
this fix corrects the wrong differentials reported in
#57
#58
cvxgrp/cvxpylayers#135