Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model is very sensitive to the tiny change of Spectrum #61

Open
mayqinxu opened this issue Dec 19, 2024 · 1 comment
Open

Model is very sensitive to the tiny change of Spectrum #61

mayqinxu opened this issue Dec 19, 2024 · 1 comment

Comments

@mayqinxu
Copy link

mayqinxu commented Dec 19, 2024

Hi, thanks for the great work! But I found that the model exhibits high sensitivity to subtle changes in the frequency and affecting the model's performance greatly. For example, when using funasr to load the model and process audio at 22050hz, the default setting is to use torchaudio for resampling. However, the predicted results from this way differ significantly from using Sox resampling audio before using model for prediction. I compared the spectrogram of the two resampling methods and found the high frequencies in the Sox resampled audio were missing compared to those resampled with torchaudio. After using Audition to remove the high frequency parts the prediction results were correct. Additionally, I tried deleting all content above 4khz and found the prediction results were inaccurate again.
It's possible that the model may not have undergone much data augmentation in frequency during training, leading to an over-sensitivity to irrelevant details, which greatly affects the practical use of the model. I wonder if there is a new version that addresses this issue?

@mayqinxu
Copy link
Author

mayqinxu commented Dec 19, 2024

Here's an example of resampling audio to 16khz using torchaudio and sox:

happy_torchaudio
happy_sox
and the prediction results are:

torchaudio:
rtf_avg: 0.073: 100%|█████████████████████████████████| 1/1 [00:00<00:00, 3.72it/s]
[{'key': 'happy_16k', 'labels': ['生气/angry', '厌恶/disgusted', '恐惧/fearful', '开心/happy', '中立/neutral', '其他/other', '难过/sad', '吃惊/surprised', ''], 'scores': [0.00010870184632949531, 5.81611811867333e-06, 3.5695826227311045e-05, 0.29204338788986206, 2.16168409679085e-05, 1.1871205407576468e-11, 4.007801544503309e-05, 0.7077447175979614, 2.7539997192460586e-12]}]

Sox:
rtf_avg: 0.078: 100%|█████████████████████████████████| 1/1 [00:00<00:00, 3.51it/s]
[{'key': 'asta-happy_ref', 'labels': ['生气/angry', '厌恶/disgusted', '恐惧/fearful', '开心/happy', '中立/neutral', '其他/other', '难过/sad', '吃惊/surprised', ''], 'scores': [0.00015378545504063368, 5.935596163908485e-06, 3.130449113086797e-05, 0.7070081233978271, 1.4909569472365547e-05, 3.165902956459021e-11, 2.4581770048826e-05, 0.2927614450454712, 2.931226442126622e-12]}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant