You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to fine-tune the model on a french corpus when I realized the loss kept turning into NaN, which ruined the model's parameters.
After some investigation I found a culprit : the code that normalises the audio features (specifically allosaurus.pm.utils.feature_cmvn()) has several problems that can cause the features to become NaN, causing NaNs to appear further down the line during training.
First, on line 10, the computation for spk_std is numerically unstable and can find a negative variance (for instance, -0.156 whereas numpy.var() finds 4.50e-07), and then computing its square root returns NaN.
This can be fixed by replacing this line with spk.std = np.std(feature, axis=0) (also line 9 can be removed) .
Second, on line 12, there is a division by the standard deviation, but there is no guarantee that it is not 0. As a result, features can be turned into NaN when their variance is null.
This can be fixed by adding the line spk_std += (spk_std == 0.), which replaces the zeros with ones, before the division.
Here is a file for which these problems occur, taken from the Mozilla CommonVoice dataset. FNH4QW-sample-0.wav.zip
The text was updated successfully, but these errors were encountered:
Hi, thanks for the detailed comments and suggestions!
I did not know that std computing can be unstable, thanks for debugging this.
I will prepare a fix to update those.
I was trying to fine-tune the model on a french corpus when I realized the loss kept turning into NaN, which ruined the model's parameters.
After some investigation I found a culprit : the code that normalises the audio features (specifically
allosaurus.pm.utils.feature_cmvn()
) has several problems that can cause the features to become NaN, causing NaNs to appear further down the line during training.First, on line 10, the computation for
spk_std
is numerically unstable and can find a negative variance (for instance, -0.156 whereasnumpy.var()
finds 4.50e-07), and then computing its square root returns NaN.This can be fixed by replacing this line with
spk.std = np.std(feature, axis=0)
(also line 9 can be removed) .Second, on line 12, there is a division by the standard deviation, but there is no guarantee that it is not 0. As a result, features can be turned into NaN when their variance is null.
This can be fixed by adding the line
spk_std += (spk_std == 0.)
, which replaces the zeros with ones, before the division.Here is a file for which these problems occur, taken from the Mozilla CommonVoice dataset.
FNH4QW-sample-0.wav.zip
The text was updated successfully, but these errors were encountered: