Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

why not use tanh in encoder while use it in decoder ? #11

Open
chuanchuan12138 opened this issue Jun 25, 2019 · 2 comments
Open

why not use tanh in encoder while use it in decoder ? #11

chuanchuan12138 opened this issue Jun 25, 2019 · 2 comments

Comments

@chuanchuan12138
Copy link

Firstly, thanks for your code ,it really helps me a lot to understander the paper.
But when i debug the code , i find that in modules.py seanny used tanh in decoder while omit it in encoder ,but in paper ,formula 8 and 12 both use tanh to calculate part of attention weight.
I dont know why , can anybody offer some help?Thanks in advance !

@ljtruong
Copy link

ljtruong commented Aug 19, 2020

Here's my experimentation with and without tanh in the encoder. Note, I've ensured I've set my model to eval + no_grad before predicting and no_grad during validation. which is different in this repo and I believe it should have been implemented.

without tanh in encoder
without tanh

with tanh in encoder
with tanh

In addition, during training the validation loss will reduce faster with tanh. 10 epochs
with tanh
without tanh
with tanh in encoder
with tanh

Note: I've trained, validated and predicted over the whole dataset for testing purposes. My assumption was I should get near 99%+ accuracy if the underlying equations are working properly.

@chuanchuan12138
Copy link
Author

Hi worulz, thanks for your careful experiment, it really clears up my confusion.
As for your no_grad operantion, I think main.py doesn't consider to have a validation or predict operation, it just train the model , while in the predict function , in my opinion, it just aims to show the loss of that train epoch, you may consider it a train process.
I don't know if it's correct or not, but I think the no_grad function is used in validation or test process, so it's necessary if you want to evaluate the model, but not this place, maybe another function.
Thank you again for your clear pics for comparison.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants