能否按照VAD来分片识别语种？ #1211

eager7 · 2024-12-20T09:45:32Z

当开启VAD时，会将音频切分成不同的片段，但是语种检测还是基于前面的片段。如果前后语言变化比较多，就会造成输出只有一种语言的情况。
那能否在每个片段上重新识别语种，初始化token，然后再输出呢？

Purfview · 2024-12-20T22:23:18Z

Possible, but not on every segment nor VAD segment, language will be autodetected on every chunk with multilingual=True option, but it's not guaranteed that a chunk contains only one language, so this option is based on wishful thinking.

If you want to try autodection on every VAD segment then I implemented such idea in Faster-Whisper-XXL, to run batched on unmerged VAD segments, activate this with these options:
--batched --unmerged --multilingual true, maybe add --vad_method pyannote_onnx_v3 --verbose true too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

能否按照VAD来分片识别语种？ #1211

能否按照VAD来分片识别语种？ #1211

eager7 commented Dec 20, 2024

Purfview commented Dec 20, 2024 •

edited

Loading

能否按照VAD来分片识别语种？ #1211

能否按照VAD来分片识别语种？ #1211

Comments

eager7 commented Dec 20, 2024

Purfview commented Dec 20, 2024 • edited Loading

Purfview commented Dec 20, 2024 •

edited

Loading