Replies: 1 comment
-
What do you mean by "more stable gradient descent"? I would use gradient accumulation in two scenarios:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is a more stable gradient descent guaranteed by adding batch size?
In which scenarios should the gradient accumulation method be used?
Beta Was this translation helpful? Give feedback.
All reactions