- Every task runs locally without internet, ensuring maximum privacy.
- 2025/7/5
- Added a noise reduction model: MossFormerGAN_SE_16K
- 2025/6/11
- Added HumAware-VAD, NVIDIA-NeMo-VAD, TEN-VAD
- 2025/6/3
- Added Dolphin ASR model to support Asian languages.
- 2025/5/13
- Added Float16/32 ASR models to support CUDA/DirectML GPU usage. These models can achieve >99% GPU operator deployment.
- 2025/5/9
- Added an option to not use VAD (Voice Activity Detection), offering greater flexibility.
- Added a noise reduction model: MelBandRoformer.
- Added three Japanese anime fine-tuned Whisper models.
- Added ASR model: CrisperWhisper.
- Added English fine-tuned ASR model: Whisper-Large-v3.5-Distil.
- Added ASR model supporting Chinese (including some dialects): FireRedASR-AED-L.
- Removed the IPEX-LLM framework to enhance overall performance.
- Cancelled LLM quantization options, standardizing on the Q4F32 format.
- Improved accuracy of FSMN-VAD.
- Improved recognition accuracy of Paraformer.
- Improved recognition accuracy of SenseVoice.
- Improved inference speed of the Whisper series by over 10%.
- Supported the following large language models (LLMs) with ONNX Runtime 100% GPU operator deployment:
- Qwen3-4B/8B
- InternLM3-8B
- Phi-4-mini-Instruct
- Gemma3-4B/12B-it
- Expanded hardware support:
- Intel OpenVINO
- NVIDIA CUDA GPU
- Windows DirectML GPU (supports integrated and discrete GPUs)
This project is built on ONNX Runtime framework.
-
Deoiser Support:
-
VAD Support:
- FSMN
- Faster_Whisper - Silero
- Official - Silero
- HumAware
- NVIDIA-NeMo-VAD-v2.0
- TEN-VAD
- Pyannote-Segmentation-3.0
- You need to accept Pyannote's terms of use and download the Pyannote
pytorch_model.bin
file. Next, place it in theVAD/pyannote_segmentation
folder.
- You need to accept Pyannote's terms of use and download the Pyannote
-
ASR Support:
-
LLM Supports:
- Run the following command in your terminal to install the latest required Python packages:
- For Apple Silicon M-series chips, avoid installing
onnxruntime-openvino
, as it will cause errors.
conda install ffmpeg
pip install -r requirements.txt
- Download the required models from HuggingFace: Transcribe_and_Translate_Subtitles.
- Download the
run.py
script from this repository. - Place it in the
Transcribe_and_Translate_Subtitles
folder.
- Place the videos you want to transcribe and translate in the following directory. The application will process the videos one by one.:
Transcribe_and_Translate_Subtitles/Media
- Open your preferred terminal (PyCharm, CMD, PowerShell, etc.).
- Execute the following command to start the application:
python run.py
- On the first run, you might encounter a Silero-VAD error. Simply restart the application, and it should be resolved.
- On the first run, you might encounter a libc++1.so error. Run the following commands in the terminal, and they should resolve the issue.
sudo apt update
sudo apt install libc++1
- This project currently supports:
- Intel-OpenVINO-CPU-GPU-NPU
- Windows-AMD-GPU
- NVIDIA-GPU
- Apple-CPU
- AMD-CPU
Transcribe_and_Translate_Subtitles/Results/Subtitles
- Beam Search for ASR models.
- Seed-X-PPO-7B with Beam Search
- Belle-Whisper-ZH
- Remove FSMN-VAD, Qwen, Gemma, Phi, InternLM. Only Gemma3-it-4B and Seed-X-PRO-7B are provided.
- Upscale the Resolution of Video
- Denoiser-MossFormer2-48K
- AMD-ROCm Support
- Real-Time Translate & Trascribe Video Player
OS | Backend | Denoiser | VAD | ASR | LLM | Real-Time Factor test_video.mp4 7602 seconds |
---|---|---|---|---|---|---|
Ubuntu-24.04 | CPU i3-12300 |
- | Silero | SenseVoiceSmall | - | 0.08 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | Silero | SenseVoiceSmall | Qwen2.5-7B-Instruct | 0.50 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | FSMN | SenseVoiceSmall | - | 0.054 |
Ubuntu-24.04 | CPU i3-12300 |
ZipEnhancer | FSMN | SenseVoiceSmall | - | 0.39 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | Silero | Whisper-Large-V3 | - | 0.20 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | FSMN | Whisper-Large-V3-Turbo | - | 0.148 |
- 所有任务均在本地运行,无需连接互联网,确保最大程度的隐私保护。
- 2025/7/5
- 新增 降噪 MossFormerGAN_SE_16K
- 2025/6/11
- 新增 HumAware-VAD, NVIDIA-NeMo-VAD, TEN-VAD。
- 2025/6/3
- 新增 Dolphin ASR 模型以支持亚洲语言。
- 2025/5/13
- 新增 Float16/32 ASR 模型,支持 CUDA/DirectML GPU 使用。这些模型可实现 >99% 的 GPU 算子部署率。
- 2025/5/9
- 新增 不使用 VAD(语音活动检测)的选项,提供更多灵活性。
- 新增降噪模型:MelBandRoformer。
- 新增三款日语动漫微调Whisper模型。
- 新增ASR模型:CrisperWhisper。
- 新增英语微调ASR模型:Whisper-Large-v3.5-Distil。
- 新增支持中文(包括部分方言)的ASR模型:FireRedASR-AED-L。
- 移除IPEX-LLM框架,提升整体性能。
- 取消LLM量化选项,统一采用Q4F32格式。
- 改进了FSMN-VAD的准确率。
- 改进了Paraformer的识别准确率。
- 改进了SenseVoice的识别准确率。
- 改进了Whisper系列的推理速度10%+。
- 支持以下大语言模型(LLM),实现ONNX Runtime 100% GPU算子部署:
- Qwen3-4B/8B
- InternLM3-8B
- Phi-4-mini-Instruct
- Gemma3-4B/12B-it
- 扩展硬件支持:
- Intel OpenVINO
- NVIDIA CUDA GPU
- Windows DirectML GPU(支持集成显卡和独立显卡)
这个项目基于 ONNX Runtime 框架。
-
去噪器 (Denoiser) 支持:
-
语音活动检测(VAD)支持:
- FSMN
- Faster_Whisper - Silero
- 官方 - Silero
- HumAware
- NVIDIA-NeMo-VAD-v2.0
- TEN-VAD
- Pyannote-Segmentation-3.0
- 需要接受Pyannote的使用条款,並自行下载 Pyannote
pytorch_model.bin
文件,并将其放置在VAD/pyannote_segmentation
文件夹中。
- 需要接受Pyannote的使用条款,並自行下载 Pyannote
-
语音识别(ASR)支持:
-
大语言模型(LLM)支持:
- 在终端中运行以下命令来安装所需的最新 Python 包:
- 对于苹果 M 系列芯片,请不要安装
onnxruntime-openvino
,否则会导致错误。
conda install ffmpeg
pip install -r requirements.txt
- 从 HuggingFace 下载所需模型:Transcribe_and_Translate_Subtitles
- 从此项目的仓库下载
run.py
脚本。 - 将
run.py
放置在Transcribe_and_Translate_Subtitles
文件夹中。
- 将你想要转录和翻译的视频放置在以下目录,应用程序将逐个处理这些视频:
Transcribe_and_Translate_Subtitles/Media
- 打开你喜欢的终端工具(PyCharm、CMD、PowerShell 等)。
- 运行以下命令来启动应用程序:
python run.py
- 首次运行时,你可能会遇到 Silero-VAD 错误。只需重启应用程序即可解决该问题。
- 首次运行时,你可能会遇到 libc++1.so 错误。在终端中运行以下命令,应该可以解决问题。
sudo apt update
sudo apt install libc++1
- 此项目目前支持:
- Intel-OpenVINO-CPU-GPU-NPU
- Windows-AMD-GPU
- NVIDIA-GPU
- Apple-CPU
- AMD-CPU
Transcribe_and_Translate_Subtitles/Results/Subtitles
- Beam Search for ASR models.
- Seed-X-PPO-7B with Beam Search
- Belle-Whisper-ZH
- Remove FSMN-VAD, Qwen, Gemma, Phi, InternLM. 仅提供 Gemma3-it-4B 和 Seed-X-PRO-7B。
- Upscale the Resolution of Video
- Denoiser-MossFormer2-48K
- 支持 AMD-ROCm
- 实现实时视频转录和翻译播放器