You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with reference to the KFAC paper, the parameters $\alpha$ and $\beta$ are computed from a local quadratic model.
If I call the damped curvature matrix $\hat{C} = C + (\lambda + \eta)I$ , then I find that with
$$\begin{aligned}
g&= \nabla L \\
\delta &= \alpha \hat{C}^{-1} g + \beta v
\end{aligned}$$
the partial derivatives of the quadratic model are
$$\begin{aligned}
\partial_\alpha q(\delta) &= (1+\alpha) g^T \hat{C}^{-1} g +\beta g^T v \\
\partial_\beta q(\delta) &= (1+\alpha) g^T v + \beta v^T \hat{C}v
\end{aligned}$$
which are set to zero by
$$(\alpha, \beta)=(-1,0)$$
Unless I am mistaken, this is similar to the Newton method for the quadratic model being exact in one step. There is no need for momentum, it will not improve the Newton method for the quadratic model.
I see the comment the $C$ may be the exact or approximate Fisher matrix, which would alter the calculation but is it correct that in principle with the same $C$ being used for $\delta$ and $q$ that the quadratic model is solved trivially as above?
The text was updated successfully, but these errors were encountered:
I have a question about how the parameter updates take place. As described in
https://kfac-jax.readthedocs.io/en/latest/overview.html#optimizer
and
https://kfac-jax.readthedocs.io/en/latest/overview.html#automatic-selection-of-update-coefficients
with reference to the KFAC paper, the parameters$\alpha$ and $\beta$ are computed from a local quadratic model.
If I call the damped curvature matrix$\hat{C} = C + (\lambda + \eta)I$ , then I find that with
the partial derivatives of the quadratic model are
which are set to zero by
Unless I am mistaken, this is similar to the Newton method for the quadratic model being exact in one step. There is no need for momentum, it will not improve the Newton method for the quadratic model.
I see the comment the$C$ may be the exact or approximate Fisher matrix, which would alter the calculation but is it correct that in principle with the same $C$ being used for $\delta$ and $q$ that the quadratic model is solved trivially as above?
The text was updated successfully, but these errors were encountered: