-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmarks for different pretrained models #3
Comments
With xlm-mlm-17-1280 model, the GTX 1080 8GB could not load this model, even with batch_size = 2. If you want to run this model, using TPU is recommended! |
Two other XLM models return an error: "F-score is ill-defined and being set to 0.0" - they assign everything into ONE class. The reason might be the language was not defined yet => Have to set language to Vietnamese before training! |
It does not seem TPU can load |
Bert overfit very well (?), but the F1 on the dev set does not get improved over time. Here are the log of 69 epochs
The average loss of each checkpoint is actually total loss (I did the arithmetic wrong there) but if one divide it by global steps, it can be seen descending steadily. |
Except for the last one? |
With Adam ϵ=1e-4 and learning rate=5e-6, after 4 training epochs, the Bert model gives F1 of approximately 0.8. |
XLM seems to a few times faster:
|
Please comment the benchmarks (F1, accuracy, loss, etc.) here along with instructions to reproduce these results. Training time would also be helpful.
Models to try out:
I believe to achieve remarkably better accuracy, we'll need to do some tweaking/addition to the current training data though.
The text was updated successfully, but these errors were encountered: