FloatingPointError: Loss is infinite or NaN #73

LvXiangzhu · 2024-02-26T08:08:23Z

Thank you for your nice work!
However, I encountered some issues when I try to run it. I haven't been able to solve this error for a long time so I have to ask you for help.

When the procedure go into "TextEncoder" of "CustomCLIP" for the second time, it makes this error: "FloatingPointError: Loss is infinite or NaN!"

I debugged this error and found that the problem was in the TextEncoder's transformer network: Before the input enters the first LayerNorm, there is no NaN. But after LayerNorm, the output appears NaN.

I have searched the solutions of this error.
Someone says that it may be because float16 is not precise enough, causing overflow, and needs to be converted to float32. But your code is like this:

orig_type = x.dtype
ret = super().forward(x.type(torch.float32))
return ret.type(orig_type)

The input type has been converted to float32 before LayerNorm.

In addition, the input values are not large. They are all in the order of magnitude of about 1e-2.

So does anyone know why this error occurs?

The text was updated successfully, but these errors were encountered:

Lilzhuzixi · 2024-05-22T13:53:45Z

halo friend! Did you work out this error? Can you give me some advice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FloatingPointError: Loss is infinite or NaN #73

FloatingPointError: Loss is infinite or NaN #73

LvXiangzhu commented Feb 26, 2024

Lilzhuzixi commented May 22, 2024

FloatingPointError: Loss is infinite or NaN #73

FloatingPointError: Loss is infinite or NaN #73

Comments

LvXiangzhu commented Feb 26, 2024

Lilzhuzixi commented May 22, 2024