Skip to content

Apparently wrong derivatives in simple softmax-as-optimization formulation #58

Closed
@currymj

Description

@currymj

I was trying to use cvxpylayers to get derivatives of the problem below (essentially softmax-as-constrained optimization) and ran into a bug.

$$ \begin{aligned} \max_x & \text{ } v x_1 + b x_2 - \alpha \sum_{i=1}^2 x_i \log x_i \\ & \text{ s. t. } \sum_{i=1}^2 x_i = 1 \end{aligned} $$

where $v$ and $b$ are the parameters. The optimal solution to this problem is $x = \text{softmax}(v/\alpha, b/\alpha)$.

I reported the bug at cvxgrp/cvxpylayers#145 as well but it was suggested I post here as well, as it seems the bug may be in diffcp.

Here's my best attempt at a MWE that only uses diffcp. I ran the cvxpylayers code in a debugger and just pulled out the problem parameters and hardcoded them (please let me know if I'm using the library wrong!). The derivative of x1 at point v=0.6, b=0.58 wrt b should be -10.4994 but ends up being 0.01412.

import numpy as np
import diffcp
import scipy

# program is maximize_x v*x1 + b*x2 - smooth_coeff*sum(x_i log x_i) s.t. sum(x) == 1
# we care about derivative of solution wrt b
# the analytic solution is just x = softmax(v/smooth_coeff, b/smooth_coeff)

a = scipy.sparse.csc_matrix(np.array([[ 1.,  1.,  0.,  0.],
        [-1.,  0.,  0.,  0.],
        [ 0., -1.,  0.,  0.],
        [ 0.,  0., -1.,  0.],
        [-1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0., -1.],
        [ 0., -1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.]]))

bb = np.array([1., 0., 0., 0., 0., 1., 0., 0., 1.])


# second coordinate here corresponds to negative of parameter b
# last two coordinates correspond to smoothing param of 0.01
c = np.array([-0.6 ,  -0.58  , -0.01, -0.01])

cone_dict = {'l': 2, 'q': [], 'ep': 2, 's': [], 'p': [], 'z': 1}

kwargs = {'verbose': False, 'eps_abs': 1e-05, 'eps_rel': 1e-05}

solve_method = 'SCS'

x, y, s, D, DT = diffcp.solve_and_derivative(a, bb, c, cone_dict, **kwargs)

zeros = np.zeros_like(bb)
dx = np.array([1.0,0.0,0.0,0.0])
dA, db, dc = DT(dx, zeros, zeros)

# derivative wrt parameter b_val should be -dc[1]
print('calculated derivative wrt b parameter from problem (= -dc[1]) is', -dc[1])
print('deriv of softmax(v/smooth,b/smooth) wrt b parameter from problem is approx -10.4994')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions