-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Adding new parameters in ColumnParallelLinear/RowParallelLinear raises AssertionError (Communication call has not been issued for this bucket) when using overlap-grad-reduce #1150
Comments
Did you solve this problem? I meet the same problem. |
Not yet. But I found more colleagues around me also meet this problem. |
same issue when using |
Also getting this error with
|
I am having exactly the same issue. Using the distributed optimizer with EP, and enabling |
whatup guys, maybe my issue is related and if I'm right, you can fix this by locating custom layers in forward order |
Hi, I am trying to add some new learnable parameters inside ColumnParallelLinear/RowParallelLinear, and the following is an example code snippet:
However, this gives me the following error during training.
This happens when the following arguments are passed for training:
It seems the newly added parameter is not counted into
self.params_with_grad
.However, the training goes normal when I do the same procedure in other places, e.g., the init fucntion of ParallelAttention, or ParallelMLP, with no such errors.
The text was updated successfully, but these errors were encountered: