-
Notifications
You must be signed in to change notification settings - Fork 22
Apparently wrong derivatives in simple softmax-as-optimization formulation #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One quick other idea for debugging: you can use the |
Bumping: the issue you are facing currymj is probably related to the one I mentionend 5 months ago in #57 (comment). As a side remark, I am currently investigating the issue with the help of an intern. Indications such as the one given by bamos are indeed very valuable to us: any other hints are welcome ! |
That's great! Please keep us updated
It's somewhat hidden, but If the issue is related to the exponential cone, the derivative for the exponential cone projection is also here and should match these papers:
There are also tests for the exponential cone derivatives here, along with other tests for the derivatives/adjoint derivatives. When this issue is fixed it could also be nice to include these examples in there too. |
@AxelBreuer, thanks for looking into this. There were some issues with the exponential cone derivative a few years ago. At the time we thought we fixed them. But we may have missed something. Look at the revision history for |
Thanks a lot for all these useful extra informations ! Concerning the C++ code, when looking at https://github.com/cvxgrp/diffcp/blob/master/cpp/src/cones.cpp, we see at line 214 of LinearOperator _dprojection_exp(const Vector &x, bool dual):
However, in equations (26) of "Solution refinement at regular points of conic problems" the lines are ordered differently:
Did you you apply a "circular shift" of the equations on purpose ? |
I think we found the bug at the origin of inaccurate/wrong differentials for problems involving exponential cones (a small typo in the C++ code, unrelated to the "circular shift" mentionned above which turned out to be harmless). We made a pull request which solves issues #58, #57, cvxgrp/cvxpylayers#135 (and probably cvxgrp/cvxpylayers#145 as well) |
It appears to indeed be fixed, see cvxgrp/cvxpylayers#145 (comment) |
I was trying to use cvxpylayers to get derivatives of the problem below (essentially softmax-as-constrained optimization) and ran into a bug.
where$v$ and $b$ are the parameters. The optimal solution to this problem is $x = \text{softmax}(v/\alpha, b/\alpha)$ .
I reported the bug at cvxgrp/cvxpylayers#145 as well but it was suggested I post here as well, as it seems the bug may be in diffcp.
Here's my best attempt at a MWE that only uses diffcp. I ran the cvxpylayers code in a debugger and just pulled out the problem parameters and hardcoded them (please let me know if I'm using the library wrong!). The derivative of x1 at point v=0.6, b=0.58 wrt b should be -10.4994 but ends up being 0.01412.
The text was updated successfully, but these errors were encountered: