You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2021. It is now read-only.
I am sorry to bother you here with the problme about xlnet pretraining.
I saw your comment on xlnet issues, you has the same error: Error recorded from outfeed: Bad hardware status: 0x1, on colab TPU. Nowadays, I try to pretrain XLNet on colab tpu, and I am meeting the problem too. I also have tried with minimal batch_size= 16, but still get the error. So I want to ask you if you have solved the problem, and can you pretrain xlnet on colab TPU now?
Thanks!
The text was updated successfully, but these errors were encountered:
No sorry, I haven't been able to solve the problem. But what I found is that it is not the problem of tensorflow software version, because the same OOM errors occur for tensorflow "1.14.1.dev20190518" on TPUv2 using google cloud. I have tried reducing the sequence size to absurd minimum and batch size to 8, but no use.
I think the problem is due to some code in custom TPU estimator but not sure what exactly. The OOM errors signal that padding allocation has blown up, but I can't say why this is the case.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I am sorry to bother you here with the problme about xlnet pretraining.
I saw your comment on xlnet issues, you has the same error: Error recorded from outfeed: Bad hardware status: 0x1, on colab TPU. Nowadays, I try to pretrain XLNet on colab tpu, and I am meeting the problem too. I also have tried with minimal batch_size= 16, but still get the error. So I want to ask you if you have solved the problem, and can you pretrain xlnet on colab TPU now?
Thanks!
The text was updated successfully, but these errors were encountered: