-
Question in short: In I refer to the Transformer's paper and find out the major setup differences between Transformer-BIG and Transformer-BASE are Tensorflow seems to have the option to change filter dimension: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1837 |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
In our script, it is |
Beta Was this translation helpful? Give feedback.
-
@ymjiang I am interested in your transformer-big result. Feel free to share if you are able to reproduce it, or hit any blocker. BTW I'm adding fp16 support to the transformer encoder block in #505 which may benefit your work, too |
Beta Was this translation helpful? Give feedback.
-
Hi @eric-haibin-lin , unfortunately I haven't succeed in training Transformer-big to reasonable BLEU so far (my current setup only converges to BLEU=3.2 on WMT2016BPE en-de). Default Transformer-base param: (this can converge to BLEU=25) My Transformer-big param: (only converge to BLEU=3.2) I tried to tune the learning rate (by setting it to reasonably smaller) but still get similar result, so I suspect it might be due to something else wrong rather than the learning rate. |
Beta Was this translation helpful? Give feedback.
-
@ymjiang how does the log look? We might be able to offer some help. |
Beta Was this translation helpful? Give feedback.
-
@szha Sorry I am currently engaged with another task. Will get back here once I am free. |
Beta Was this translation helpful? Give feedback.
In our script, it is
--hidden_size
. You may also need to changedropout
andepsilon
accordingly.