parameter update #274

nickhalmagyi · 2024-10-05T12:27:34Z

I have a question about how the parameter updates take place. As described in

https://kfac-jax.readthedocs.io/en/latest/overview.html#optimizer

and

https://kfac-jax.readthedocs.io/en/latest/overview.html#automatic-selection-of-update-coefficients

with reference to the KFAC paper, the parameters $\alpha$ and $\beta$ are computed from a local quadratic model.

If I call the damped curvature matrix $\hat{C} = C + (\lambda + \eta)I$ , then I find that with

$$\begin{aligned} g&= \nabla L \\ \delta &= \alpha \hat{C}^{-1} g + \beta v \end{aligned}$$

the partial derivatives of the quadratic model are

$$\begin{aligned} \partial_\alpha q(\delta) &= (1+\alpha) g^T \hat{C}^{-1} g +\beta g^T v \\ \partial_\beta q(\delta) &= (1+\alpha) g^T v + \beta v^T \hat{C}v \end{aligned}$$

which are set to zero by

$$(\alpha, \beta)=(-1,0)$$

Unless I am mistaken, this is similar to the Newton method for the quadratic model being exact in one step. There is no need for momentum, it will not improve the Newton method for the quadratic model.

I see the comment the $C$ may be the exact or approximate Fisher matrix, which would alter the calculation but is it correct that in principle with the same $C$ being used for $\delta$ and $q$ that the quadratic model is solved trivially as above?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parameter update #274

parameter update #274

nickhalmagyi commented Oct 5, 2024 •

edited

Loading

parameter update #274

parameter update #274

Comments

nickhalmagyi commented Oct 5, 2024 • edited Loading

nickhalmagyi commented Oct 5, 2024 •

edited

Loading