Parameter Setup for Transformer-Big #585

ymjiang · 2019-02-11T02:15:31Z

ymjiang
Feb 11, 2019
Collaborator

Question in short: In train_transformer.py, currently the default setting is for Transformer-Base. If I want to train Transformer-Big, what parameter should I change?

I refer to the Transformer's paper and find out the major setup differences between Transformer-BIG and Transformer-BASE are num of heads, num of dimensions and filter dimensions. Then I try to find the associated parameters in train_transformer.py script, and confirm --num_heads is for num of heads, while num_units is for num of dimensions. But I cannot determine what parameter I can change to set filter dimensions.

Tensorflow seems to have the option to change filter dimension: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1837

Answered by szhengac

Feb 11, 2019

In our script, it is --hidden_size. You may also need to change dropout and epsilon accordingly.

View full answer

szhengac · 2019-02-11T03:43:33Z

szhengac
Feb 11, 2019
Maintainer

In our script, it is --hidden_size. You may also need to change dropout and epsilon accordingly.

0 replies

eric-haibin-lin · 2019-02-17T04:06:15Z

eric-haibin-lin
Feb 17, 2019
Maintainer

@ymjiang I am interested in your transformer-big result. Feel free to share if you are able to reproduce it, or hit any blocker. BTW I'm adding fp16 support to the transformer encoder block in #505 which may benefit your work, too

0 replies

ymjiang · 2019-02-17T04:48:54Z

ymjiang
Feb 17, 2019
Collaborator Author

Hi @eric-haibin-lin , unfortunately I haven't succeed in training Transformer-big to reasonable BLEU so far (my current setup only converges to BLEU=3.2 on WMT2016BPE en-de).

Default Transformer-base param: (this can converge to BLEU=25)
--adam_beta2 0.98 --hidden_size 2048 --num_heads 8 --num_units 512 --dropout 0.1

My Transformer-big param: (only converge to BLEU=3.2)
--adam_beta2 0.998 --hidden_size 4096 --num_heads 16 --num_units 1024 --dropout 0.3

I tried to tune the learning rate (by setting it to reasonably smaller) but still get similar result, so I suspect it might be due to something else wrong rather than the learning rate.

0 replies

szha · 2019-02-18T06:18:16Z

szha
Feb 18, 2019
Maintainer

@ymjiang how does the log look? We might be able to offer some help.

0 replies

ymjiang · 2019-03-06T08:29:39Z

ymjiang
Mar 6, 2019
Collaborator Author

@szha Sorry I am currently engaged with another task. Will get back here once I am free.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter Setup for Transformer-Big #585

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Parameter Setup for Transformer-Big #585

ymjiang Feb 11, 2019 Collaborator

Replies: 5 comments

szhengac Feb 11, 2019 Maintainer

eric-haibin-lin Feb 17, 2019 Maintainer

ymjiang Feb 17, 2019 Collaborator Author

szha Feb 18, 2019 Maintainer

ymjiang Mar 6, 2019 Collaborator Author

ymjiang
Feb 11, 2019
Collaborator

szhengac
Feb 11, 2019
Maintainer

eric-haibin-lin
Feb 17, 2019
Maintainer

ymjiang
Feb 17, 2019
Collaborator Author

szha
Feb 18, 2019
Maintainer

ymjiang
Mar 6, 2019
Collaborator Author