Skip to content

Apparently wrong derivatives in simple softmax-as-optimization formulation #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
currymj opened this issue Apr 27, 2023 · 7 comments
Closed

Comments

@currymj
Copy link

currymj commented Apr 27, 2023

I was trying to use cvxpylayers to get derivatives of the problem below (essentially softmax-as-constrained optimization) and ran into a bug.

$$ \begin{aligned} \max_x & \text{ } v x_1 + b x_2 - \alpha \sum_{i=1}^2 x_i \log x_i \\ & \text{ s. t. } \sum_{i=1}^2 x_i = 1 \end{aligned} $$

where $v$ and $b$ are the parameters. The optimal solution to this problem is $x = \text{softmax}(v/\alpha, b/\alpha)$.

I reported the bug at cvxgrp/cvxpylayers#145 as well but it was suggested I post here as well, as it seems the bug may be in diffcp.

Here's my best attempt at a MWE that only uses diffcp. I ran the cvxpylayers code in a debugger and just pulled out the problem parameters and hardcoded them (please let me know if I'm using the library wrong!). The derivative of x1 at point v=0.6, b=0.58 wrt b should be -10.4994 but ends up being 0.01412.

import numpy as np
import diffcp
import scipy

# program is maximize_x v*x1 + b*x2 - smooth_coeff*sum(x_i log x_i) s.t. sum(x) == 1
# we care about derivative of solution wrt b
# the analytic solution is just x = softmax(v/smooth_coeff, b/smooth_coeff)

a = scipy.sparse.csc_matrix(np.array([[ 1.,  1.,  0.,  0.],
        [-1.,  0.,  0.,  0.],
        [ 0., -1.,  0.,  0.],
        [ 0.,  0., -1.,  0.],
        [-1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0., -1.],
        [ 0., -1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.]]))

bb = np.array([1., 0., 0., 0., 0., 1., 0., 0., 1.])


# second coordinate here corresponds to negative of parameter b
# last two coordinates correspond to smoothing param of 0.01
c = np.array([-0.6 ,  -0.58  , -0.01, -0.01])

cone_dict = {'l': 2, 'q': [], 'ep': 2, 's': [], 'p': [], 'z': 1}

kwargs = {'verbose': False, 'eps_abs': 1e-05, 'eps_rel': 1e-05}

solve_method = 'SCS'

x, y, s, D, DT = diffcp.solve_and_derivative(a, bb, c, cone_dict, **kwargs)

zeros = np.zeros_like(bb)
dx = np.array([1.0,0.0,0.0,0.0])
dA, db, dc = DT(dx, zeros, zeros)

# derivative wrt parameter b_val should be -dc[1]
print('calculated derivative wrt b parameter from problem (= -dc[1]) is', -dc[1])
print('deriv of softmax(v/smooth,b/smooth) wrt b parameter from problem is approx -10.4994')
@bamos
Copy link
Collaborator

bamos commented May 1, 2023

One quick other idea for debugging: you can use the dense backward mode and obtain the matrix M from https://github.com/cvxgrp/diffcp/blob/master/diffcp/cone_program.py#L410, and then make sure the solve in https://github.com/cvxgrp/diffcp/blob/master/diffcp/cone_program.py#L469 is correctly solving the system, or check if there is some other instability/issue with M or that system

@AxelBreuer
Copy link
Contributor

AxelBreuer commented May 2, 2023

Bumping: the issue you are facing currymj is probably related to the one I mentionend 5 months ago in #57 (comment).
It seems that something is not working correctly for exponential cones.

As a side remark, I am currently investigating the issue with the help of an intern. Indications such as the one given by bamos are indeed very valuable to us: any other hints are welcome !
One blocking point in our investigations is that the source code for _diffcp is not available (specially the part for _diffcp.M_dense()). Could this code be open-sourced as well ?

@bamos
Copy link
Collaborator

bamos commented May 2, 2023

As a side remark, I am currently investigating the issue with the help of an intern.

That's great! Please keep us updated

One blocking point in our investigations is that the source code for _diffcp is not available (specially the part for _diffcp.M_dense()). Could this code be open-sourced as well ?

It's somewhat hidden, but _diffcp is just the C++ source code in this repository. It's connected with this pybind wrapper in setup.py. M_dense is defined here (below the linearoperator definition of M).

If the issue is related to the exponential cone, the derivative for the exponential cone projection is also here and should match these papers:

There are also tests for the exponential cone derivatives here, along with other tests for the derivatives/adjoint derivatives. When this issue is fixed it could also be nice to include these examples in there too.

@akshayka
Copy link
Member

akshayka commented May 2, 2023

@AxelBreuer, thanks for looking into this. There were some issues with the exponential cone derivative a few years ago. At the time we thought we fixed them. But we may have missed something.

Look at the revision history for cones.py to see what fixes we made: https://github.com/cvxgrp/diffcp/commits/master/diffcp/cones.py

@AxelBreuer
Copy link
Contributor

Thanks a lot for all these useful extra informations !

Concerning the C++ code, when looking at https://github.com/cvxgrp/diffcp/blob/master/cpp/src/cones.cpp, we see at line 214 of LinearOperator _dprojection_exp(const Vector &x, bool dual):

  J_inv << alpha, (-r + s) / s * alpha, -1, 0,
           1 + l / s * alpha, -beta, 0, alpha,
           -beta, 1 + beta * r / s, 0, (1 - r / s) * alpha,
           0, 0, 1, -1;

However, in equations (26) of "Solution refinement at regular points of conic problems" the lines are ordered differently:

  J_inv << 1 + l / s * alpha, -beta, 0, alpha,
           -beta, 1 + beta * r / s, 0, (1 - r / s) * alpha,
           0, 0, 1, -1,
           alpha, (-r + s) / s * alpha, -1, 0;

Did you you apply a "circular shift" of the equations on purpose ?

@AxelBreuer
Copy link
Contributor

AxelBreuer commented May 3, 2023

I think we found the bug at the origin of inaccurate/wrong differentials for problems involving exponential cones (a small typo in the C++ code, unrelated to the "circular shift" mentionned above which turned out to be harmless).

We made a pull request which solves issues #58, #57, cvxgrp/cvxpylayers#135 (and probably cvxgrp/cvxpylayers#145 as well)

@currymj
Copy link
Author

currymj commented Jun 6, 2023

It appears to indeed be fixed, see cvxgrp/cvxpylayers#145 (comment)

@currymj currymj closed this as completed Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants