-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model #36572
base: main
Are you sure you want to change the base?
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
hi @yao-matrix, this causes several other BART tests to fail! You can see the failures in the CI at the bottom of this post. Some of the failures are just code style - these are simple and can be fixed with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to change the modeling to fix one specific checkpoint even though I know that this is one of the most used one. Maybe other checkpoints have the shared module instead. Instead, maybe we can try to update _tie_weights
instead by tying self.shared to self.decoder.embed_tokens instead if self.decoder.embed_tokens is not on meta device ?
@Rocketknight1 @SunMarc , Marc's method is better, I used it to fix the issue, pls help review and comment, thx. |
Signed-off-by: Yao, Matrix <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this ! Really appreciate it !
@ArthurZucker , pls take a look, thx. |
fix issue: #36247
cause of this issue:
bart-large-cnn checkpoint only has
model.decoder.embed_tokens.weight
whileload_state_dict
, but in BartModel implementation, bothmodel.decoder.embed_tokens.weight
andmodel.encoder.embed_tokens.weight
tie to theshared.weights
which doesn't have corresponding real weights in checkpoints, so whentie_weights
are called, things are back to a voidshared.weights
which inmeta
device, then lead to the model.to issue@SunMarc @Rocketknight1 , I don't know whether this PR meet your standard since it changed modeling code, pls let me know your thoughts, thx.