Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cost:nan #6

Open
zhixiaochuan12 opened this issue Mar 30, 2018 · 4 comments
Open

cost:nan #6

zhixiaochuan12 opened this issue Mar 30, 2018 · 4 comments

Comments

@zhixiaochuan12
Copy link

I used my own data to train, and cost nan occured. I checked the data, clipped the gradient, and reduced the learning rate, it still occured at the same 'batch_size*batch' location. Do I have anything else to check or change to make it run normally? Thanks for your any suggestion.

the nan error looks like follows:
[batch 1044] cost: 2.06923
[batch 1045] cost: 1.79236
[batch 1046] cost: 1.9501
[batch 1047] cost: 1.86483
[batch 1048] cost: nan
[batch 1049] cost: nan

@antct
Copy link

antct commented Nov 16, 2018

I also encountered this problem when used ABCNN-1 and ABCNN3.
But the problem seems to be different from the one in the issue. The nan occured at the beginning of my training.
I finally find out the problem may be caused by tf.sqrt() when calculating the attention matrix. Specifically, it may generate some values such as -3.0*10^-6 which should be zero.

@JesseNLP
Copy link

JesseNLP commented Jul 2, 2019

我也用的自己的数据集,欧几里得距离换成了L1距离后,就没有nan值产生了。

@weibobo2015
Copy link

The same. The nan occured at the beginning of my training when used ABCNN-1 and ABCNN3.
Use the Manhattan distance can solve this problem.
However, in this case, how can we use the Euclidean distance?

@weibobo2015
Copy link

Adding 1e-6 into tf.sqrt() can run well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants