You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the great work! But I found that the model exhibits high sensitivity to subtle changes in the frequency and affecting the model's performance greatly. For example, when using funasr to load the model and process audio at 22050hz, the default setting is to use torchaudio for resampling. However, the predicted results from this way differ significantly from using Sox resampling audio before using model for prediction. I compared the spectrogram of the two resampling methods and found the high frequencies in the Sox resampled audio were missing compared to those resampled with torchaudio. After using Audition to remove the high frequency parts the prediction results were correct. Additionally, I tried deleting all content above 4khz and found the prediction results were inaccurate again.
It's possible that the model may not have undergone much data augmentation in frequency during training, leading to an over-sensitivity to irrelevant details, which greatly affects the practical use of the model. I wonder if there is a new version that addresses this issue?
The text was updated successfully, but these errors were encountered:
Hi, thanks for the great work! But I found that the model exhibits high sensitivity to subtle changes in the frequency and affecting the model's performance greatly. For example, when using funasr to load the model and process audio at 22050hz, the default setting is to use torchaudio for resampling. However, the predicted results from this way differ significantly from using Sox resampling audio before using model for prediction. I compared the spectrogram of the two resampling methods and found the high frequencies in the Sox resampled audio were missing compared to those resampled with torchaudio. After using Audition to remove the high frequency parts the prediction results were correct. Additionally, I tried deleting all content above 4khz and found the prediction results were inaccurate again.
It's possible that the model may not have undergone much data augmentation in frequency during training, leading to an over-sensitivity to irrelevant details, which greatly affects the practical use of the model. I wonder if there is a new version that addresses this issue?
The text was updated successfully, but these errors were encountered: