Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

能否按照VAD来分片识别语种? #1211

Open
eager7 opened this issue Dec 20, 2024 · 1 comment
Open

能否按照VAD来分片识别语种? #1211

eager7 opened this issue Dec 20, 2024 · 1 comment

Comments

@eager7
Copy link

eager7 commented Dec 20, 2024

当开启VAD时,会将音频切分成不同的片段,但是语种检测还是基于前面的片段。如果前后语言变化比较多,就会造成输出只有一种语言的情况。
那能否在每个片段上重新识别语种,初始化token,然后再输出呢?

@Purfview
Copy link
Contributor

Purfview commented Dec 20, 2024

Possible, but not on every segment nor VAD segment, language will be autodetected on every chunk with multilingual=True option, but it's not guaranteed that a chunk contains only one language, so this option is based on wishful thinking.

If you want to try autodection on every VAD segment then I implemented such idea in Faster-Whisper-XXL, to run batched on unmerged VAD segments, activate this with these options:
--batched --unmerged --multilingual true, maybe add --vad_method pyannote_onnx_v3 --verbose true too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants