Skip to content

Inaccurate gradient estimation for log penalization #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AxelBreuer opened this issue Nov 20, 2022 · 1 comment
Closed

Inaccurate gradient estimation for log penalization #57

AxelBreuer opened this issue Nov 20, 2022 · 1 comment

Comments

@AxelBreuer
Copy link
Contributor

AxelBreuer commented Nov 20, 2022

Hi,

I have reported an equivalent issue on cvxpylayers cvxgrp/cvxpylayers#135 but since cvxpylayers uses diffcp, I wanted to report this issue here as well.

I have built an even more simple example with inaccurate gradient estimation:

Let's suppose we want to maximize $J(x) = x + \lambda (\log(1-x) + \log(1+x))$ where $\lambda \in R^+ $.
The solution is $x^* = -\lambda + \sqrt{\lambda²+1}$
And $\frac{dx^*}{d\lambda} = -1 + \frac{\lambda}{\sqrt{\lambda²+1}}$
Now, let's calculate the sensitivity of $x^{*}$ with respect to $\lambda$, at $\lambda=1$ and $\delta\lambda=10^{-6}$, with diffcp:

import numpy as np
import cvxpy as cp
import diffcp.cone_program as cone_prog
import diffcp.utils as utils

# variables
x_ = cp.Variable()

# parameters
_lam = cp.Parameter(1, nonneg=True)
_lam.value = np.ones(1)

# objective
objective = cp.Maximize(x_ + _lam*cp.log(1+x_) + _lam*cp.log(1-x_))

problem = cp.Problem(objective)

A, b, c, cone_dims = utils.scs_data_from_cvxpy_problem(problem)

x, y, s, D, DT = cone_prog.solve_and_derivative(A,
                                                b,
                                                c,
                                                cone_dims,
                                                solve_method="ECOS",
                                                mode="dense",
                                                feastol=1e-10,
                                                abstol=1e-10,
                                                reltol=1e-10)

dlam = 1e-6
dA = utils.get_random_like(A, lambda n: np.zeros(n))
db = np.zeros(b.size)
dc = np.zeros(c.size)
dc[1] = dc[1] - dlam  # the minus sign stems from the fact that  c =  [-1., -1., -1.]
dc[2] = dc[2] - dlam  # the minus sign stems from the fact that  c =  [-1., -1., -1.]

dx, dy, ds = D(dA, db, dc)

x_pert, y_pert, s_pert, _, _ = cone_prog.solve_and_derivative(A + dA,
                                                              b + db,
                                                              c + dc,
                                                              cone_dims,
                                                              solve_method="ECOS",
                                                              mode="dense",
                                                              feastol=1e-10,
                                                              abstol=1e-10,
                                                              reltol=1e-10)

print(f"x_pert-x   = {x_pert-x}")
print(f"      dx   = {dx}")
analytical = -1+_lam.value/np.sqrt(_lam.value**2+1)
print(f"analytical = {analytical*dlam}")

and you get

x_pert-x   = [-2.93091918e-07 -2.07319449e-07  5.00378592e-07]
      dx   = [-2.61203874e-07 -1.84699034e-07  4.45902905e-07]
analytical = [-2.92893219e-07]

Where we see that $\frac{dx^*}{d\lambda} \delta\lambda$ has a finite difference approximation (-2.930e-7) which is much closer to the analytical gradient (-2.923e-7) than the Automatic Difference gradient (-2.612e-7), which is very surprising, especially for such a simple (unconstrained) problem.
Do you have an idea why AD performs so poorly on this example ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant