Model is very sensitive to the tiny change of Spectrum #61

mayqinxu · 2024-12-19T09:00:44Z

Hi, thanks for the great work! But I found that the model exhibits high sensitivity to subtle changes in the frequency and affecting the model's performance greatly. For example, when using funasr to load the model and process audio at 22050hz, the default setting is to use torchaudio for resampling. However, the predicted results from this way differ significantly from using Sox resampling audio before using model for prediction. I compared the spectrogram of the two resampling methods and found the high frequencies in the Sox resampled audio were missing compared to those resampled with torchaudio. After using Audition to remove the high frequency parts the prediction results were correct. Additionally, I tried deleting all content above 4khz and found the prediction results were inaccurate again.
It's possible that the model may not have undergone much data augmentation in frequency during training, leading to an over-sensitivity to irrelevant details, which greatly affects the practical use of the model. I wonder if there is a new version that addresses this issue?

mayqinxu · 2024-12-19T09:17:26Z

Here's an example of resampling audio to 16khz using torchaudio and sox:

and the prediction results are:

torchaudio:
rtf_avg: 0.073: 100%|█████████████████████████████████| 1/1 [00:00<00:00, 3.72it/s]
[{'key': 'happy_16k', 'labels': ['生气/angry', '厌恶/disgusted', '恐惧/fearful', '开心/happy', '中立/neutral', '其他/other', '难过/sad', '吃惊/surprised', ''], 'scores': [0.00010870184632949531, 5.81611811867333e-06, 3.5695826227311045e-05, 0.29204338788986206, 2.16168409679085e-05, 1.1871205407576468e-11, 4.007801544503309e-05, 0.7077447175979614, 2.7539997192460586e-12]}]

Sox:
rtf_avg: 0.078: 100%|█████████████████████████████████| 1/1 [00:00<00:00, 3.51it/s]
[{'key': 'asta-happy_ref', 'labels': ['生气/angry', '厌恶/disgusted', '恐惧/fearful', '开心/happy', '中立/neutral', '其他/other', '难过/sad', '吃惊/surprised', ''], 'scores': [0.00015378545504063368, 5.935596163908485e-06, 3.130449113086797e-05, 0.7070081233978271, 1.4909569472365547e-05, 3.165902956459021e-11, 2.4581770048826e-05, 0.2927614450454712, 2.931226442126622e-12]}]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model is very sensitive to the tiny change of Spectrum #61

Model is very sensitive to the tiny change of Spectrum #61

mayqinxu commented Dec 19, 2024 •

edited

Loading

mayqinxu commented Dec 19, 2024 •

edited

Loading

Model is very sensitive to the tiny change of Spectrum #61

Model is very sensitive to the tiny change of Spectrum #61

Comments

mayqinxu commented Dec 19, 2024 • edited Loading

mayqinxu commented Dec 19, 2024 • edited Loading

mayqinxu commented Dec 19, 2024 •

edited

Loading

mayqinxu commented Dec 19, 2024 •

edited

Loading