You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the AdaBelief paper, there is only one epsilon = 1e-8 that is used both to damp the second moment estimate and as constant in the denominator. In Optax, there are instead eps = 1e-16 and root_eps = 1e-16. Initially, I just set eps = 1e-8 in the hope to match the paper, but just no noticed that I also need to set root_eps = 1e-1. A few ideas how this might be improved:
Add a note in the documentation
Use the defaults eps = 1e-8 and root_eps = None and the set if root_eps is None: root_eps = eps
At least default eps = 1e-8 and root_eps = 1e-8
Is there a particular reason the implementation uses different default hparams?
The text was updated successfully, but these errors were encountered:
In the AdaBelief paper, there is only one
epsilon = 1e-8
that is used both to damp the second moment estimate and as constant in the denominator. In Optax, there are insteadeps = 1e-16
androot_eps = 1e-16
. Initially, I just seteps = 1e-8
in the hope to match the paper, but just no noticed that I also need to setroot_eps = 1e-1
. A few ideas how this might be improved:eps = 1e-8
androot_eps = None
and the setif root_eps is None: root_eps = eps
eps = 1e-8
androot_eps = 1e-8
Is there a particular reason the implementation uses different default hparams?
The text was updated successfully, but these errors were encountered: