Very poor performance on my own wav file, is there anything wrong? #40

Yunlong-He · 2020-09-18T15:17:05Z

I just did a simple try with my phone call wave file, which is about 2.5 minutes, only 2 speakers in total. However, with pretrained model in this project, it returns 3 speakers and many slices contains voices from 2 speakers, I know that uis-rnn doesn't support setting speaker numbers, but the poor performance seems incorrect, has anybody met it?

Thanks if any suggestions.

ShuningZhao · 2020-09-20T05:28:52Z

Is your own wav file in English or Chinese? I had a look at the wav file in this repo, it seems like the models were trained in Mandarin. Hence the results were bad on my English wav files.

Yunlong-He · 2020-09-20T10:25:37Z

Thanks to Shuning, I verified on Mandarin wave files too, and I just check the sample result provided in this project, it seems not very good too. I use following code to split wave file:

sound = AudioSegment.from_file(wav_path)
for spk,timeDicts in speakerSlice.items():
    print('========= ' + str(spk) + ' =========')
    index = 0
    for timeDict in timeDicts:
        s = timeDict['start']
        e = timeDict['stop']
        seg = sound[s:e]
        seg.export("wavs/rmd/" + str(spk) + "/seg_%d.wav" % index, format="wav")
        index = index + 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very poor performance on my own wav file, is there anything wrong? #40

Very poor performance on my own wav file, is there anything wrong? #40

Yunlong-He commented Sep 18, 2020

ShuningZhao commented Sep 20, 2020

Yunlong-He commented Sep 20, 2020

Very poor performance on my own wav file, is there anything wrong? #40

Very poor performance on my own wav file, is there anything wrong? #40

Comments

Yunlong-He commented Sep 18, 2020

ShuningZhao commented Sep 20, 2020

Yunlong-He commented Sep 20, 2020