-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad performance when using for speech enhancement #33
Comments
It seems to be caused by the choice of loss function, i.e., SI-SDR. SI-SDR does not restrict the magnitude of waveform, which may cause the the chopping effect. I think you can replace SI-SDR loss with other loss options like SNR or wave-L_1. |
@Andong-Li-speech Hi, thanks for your suggestions! While the result seems still not very good after changing the loss function to SNR loss... But it works much better! I wonder if you are also working on this part, what kind of loss function are you using? Thanks a lot in advance! |
@jkzhang7 Hi, do you get a better performance? I face the same problem now. Best wishes to you! |
@LittleFlyingSheep Hi~ Did you solved this problem now? seem to meet the same problem , the magnitude of separate waveform is too big and sounds not very well, thanks a lot if you could give me some advice~ |
@forestlee95 One way I choose to solve it is to scale the waveform artificially. I choose the max value of the input noisy and divide it with the output. This method will get a relatively good performance. This is just my helpless action. If you have any other methods, please letter me. |
@LittleFlyingSheep @jkzhang7 Hi, I am looking for the speech enhancement performance of conv-tasnet on vctk dataset, do you guys have any performance data about it? Much appreciated. |
收到
|
|
收到
|
Hi, very nice work. I noticed that some people are using Conv-TasNet for speech enhancement and get good results. While I encountered some problem while using this code for speech enhancement... I am trying to split clean speech and noise from a noisy speech. I am using VCTK dataset. The waveform of the results seem very weird...
When I changed the activation of mask to sigmoid, the result is still not good.
I wonder anyone has a thought how to solve this problem. Thanks in advance!
The text was updated successfully, but these errors were encountered: