谢谢作者分享的软件 #18

lin16303 · 2024-06-08T03:08:42Z

目前用以下设置，识别语音延迟估计还有3秒左右，希望可以继续优化，除了faster whisper，好像还有其他几个变种whisper速度更快一点。另用qwen1.5 14b翻译1秒左右@4090

python translator.py device --use_faster_whisper --model small.en --min_audio_length 0.3 --max_audio_length 3.0 --gpt_base_url http://192.168.0.8:5000/v1 --gpt_translation_prompt ="Translate from English to Chinese" --openai_api_key {your_openai_key}

lin16303 · 2024-06-08T03:37:05Z

还有个小bug，打开识别软件后，右下脚状态栏包括时钟会一直在闪

lin16303 · 2024-06-09T03:59:46Z

这个frame_duration 0.2设置更好一点，否则经常有skip。。。
python ./stream-translator-gpt/translator.py device --use_faster_whisper --model large --min_audio_length 0.3 --max_audio_length 5.0 --vad_threshold 0.15 --gpt_base_url http://192.168.0.78:11434/v1 --gpt_translation_prompt ="directly translate from English to Chinese, no annotations" --openai_api_key "~Abc7654321" --gpt_model "qwen:14b-chat-q4_0" --frame_duration 0.2

lin16303 · 2024-06-09T12:30:48Z

减小max_audio_length，可以缩短延迟，但是太短容易把一句话截断
--min_audio_length 0.2 --max_audio_length 2.5

lin16303 · 2024-06-14T23:28:47Z

是否可以再配合tts形成一个同传应用，--max_audio_length 2.0，这时候延迟已经接近youtube自动翻译字幕。配合edgetts，这个效果最稳定速度也最快或开源的chattts，GPT-SoVITS效果应该不错

ionic-bond · 2024-07-09T07:37:41Z

不好意思没看到邮件回复晚了

ionic-bond · 2024-07-09T07:47:24Z

目前用以下设置，识别语音延迟估计还有3秒左右，希望可以继续优化，除了faster whisper，好像还有其他几个变种whisper速度更快一点。另用qwen1.5 14b翻译1秒左右@4090

python translator.py device --use_faster_whisper --model small.en --min_audio_length 0.3 --max_audio_length 3.0 --gpt_base_url http://192.168.0.8:5000/v1 --gpt_translation_prompt ="Translate from English to Chinese" --openai_api_key {your_openai_key}

我输出了一下faster whisper的耗时，在2080ti上跑large平均来说每句在1秒以内（不过偶尔会出现一句3~4秒的）。
这里感觉优化空间不是很大，不过我会看看其他whisper，如果有更加稳定的可以试试引入

ionic-bond · 2024-07-09T07:49:04Z

还有个小bug，打开识别软件后，右下脚状态栏包括时钟会一直在闪

这种情况我没遇到过，方便说下什么系统吗？

ionic-bond · 2024-07-09T07:57:01Z

感谢关于参数的反馈，min_audio_length 当初随手写的3.0，这个确实应该调低点默认值以减少延迟，但 max_audio_length 我感觉默认值不能设太低，否则强制截断的情况会出现比较多。
frame_duration 这个参数感觉比较玄学，我会试试调高点看看效果怎样。

ionic-bond · 2024-07-09T07:59:23Z

edgetts

nice idea，我会调研一下你说的两个tts，如果比较轻量的话可以加进去

lin16303 · 2024-07-09T12:06:42Z

还有个小bug，打开识别软件后，右下脚状态栏包括时钟会一直在闪

这种情况我没遇到过，方便说下什么系统吗？

谢谢回复，我的系统是win11，右下角状态栏麦克风，qq图标那部分一直在闪。

lin16303 · 2024-07-09T12:30:44Z

edgetts

nice idea，我会调研一下你说的两个tts，如果比较轻量的话可以加进去

关于同声传译，我想到的问题是，tts会不会和原声母语混合输出，对识别造成干扰。想到的办法是tts是否可以单独走蓝牙通道，与原声输入输出分开，不知道技术上如何实现？https://www.douyin.com/video/7069616563321703711

无论如何，作者的这个软件是目前我尝试过所有开源的语音识别+翻译最佳解决方案了，再次表示感谢

ionic-bond · 2024-07-17T14:05:26Z

还有个小bug，打开识别软件后，右下脚状态栏包括时钟会一直在闪

这种情况我没遇到过，方便说下什么系统吗？

谢谢回复，我的系统是win11，右下角状态栏麦克风，qq图标那部分一直在闪。

我大概地搜索了一下，这个有点难判断是什么问题呢……

ionic-bond · 2024-07-17T14:11:07Z

edgetts

nice idea，我会调研一下你说的两个tts，如果比较轻量的话可以加进去

关于同声传译，我想到的问题是，tts会不会和原声母语混合输出，对识别造成干扰。想到的办法是tts是否可以单独走蓝牙通道，与原声输入输出分开，不知道技术上如何实现？https://www.douyin.com/video/7069616563321703711

无论如何，作者的这个软件是目前我尝试过所有开源的语音识别+翻译最佳解决方案了，再次表示感谢

如果走声卡输出的话确实会干扰输入，走显示器或者蓝牙感觉都可以考虑

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

谢谢作者分享的软件 #18

谢谢作者分享的软件 #18

lin16303 commented Jun 8, 2024

lin16303 commented Jun 8, 2024

lin16303 commented Jun 9, 2024

lin16303 commented Jun 9, 2024

lin16303 commented Jun 14, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

lin16303 commented Jul 9, 2024

lin16303 commented Jul 9, 2024

ionic-bond commented Jul 17, 2024

ionic-bond commented Jul 17, 2024

谢谢作者分享的软件 #18

谢谢作者分享的软件 #18

Comments

lin16303 commented Jun 8, 2024

lin16303 commented Jun 8, 2024

lin16303 commented Jun 9, 2024

lin16303 commented Jun 9, 2024

lin16303 commented Jun 14, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

ionic-bond commented Jul 9, 2024

lin16303 commented Jul 9, 2024

lin16303 commented Jul 9, 2024

ionic-bond commented Jul 17, 2024

ionic-bond commented Jul 17, 2024