cost:nan #6

zhixiaochuan12 · 2018-03-30T13:44:04Z

I used my own data to train, and cost nan occured. I checked the data, clipped the gradient, and reduced the learning rate, it still occured at the same 'batch_size*batch' location. Do I have anything else to check or change to make it run normally? Thanks for your any suggestion.

the nan error looks like follows:
[batch 1044] cost: 2.06923
[batch 1045] cost: 1.79236
[batch 1046] cost: 1.9501
[batch 1047] cost: 1.86483
[batch 1048] cost: nan
[batch 1049] cost: nan

antct · 2018-11-16T07:52:10Z

I also encountered this problem when used ABCNN-1 and ABCNN3.
But the problem seems to be different from the one in the issue. The nan occured at the beginning of my training.
I finally find out the problem may be caused by tf.sqrt() when calculating the attention matrix. Specifically, it may generate some values such as -3.0*10^-6 which should be zero.

JesseNLP · 2019-07-02T09:21:10Z

我也用的自己的数据集，欧几里得距离换成了L1距离后，就没有nan值产生了。

weibobo2015 · 2020-04-08T04:31:43Z

The same. The nan occured at the beginning of my training when used ABCNN-1 and ABCNN3.
Use the Manhattan distance can solve this problem.
However, in this case, how can we use the Euclidean distance?

weibobo2015 · 2020-04-08T04:49:11Z

Adding 1e-6 into tf.sqrt() can run well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cost:nan #6

cost:nan #6

zhixiaochuan12 commented Mar 30, 2018

antct commented Nov 16, 2018

JesseNLP commented Jul 2, 2019

weibobo2015 commented Apr 8, 2020

weibobo2015 commented Apr 8, 2020

cost:nan #6

cost:nan #6

Comments

zhixiaochuan12 commented Mar 30, 2018

antct commented Nov 16, 2018

JesseNLP commented Jul 2, 2019

weibobo2015 commented Apr 8, 2020

weibobo2015 commented Apr 8, 2020