Transcribe and Translate Subtitles

🚨 Important Note

Every task runs locally without internet, ensuring maximum privacy.

Updates

2025/7/5
- Added a noise reduction model: MossFormerGAN_SE_16K
2025/6/11
- Added HumAware-VAD, NVIDIA-NeMo-VAD, TEN-VAD
2025/6/3
- Added Dolphin ASR model to support Asian languages.
2025/5/13
- Added Float16/32 ASR models to support CUDA/DirectML GPU usage. These models can achieve >99% GPU operator deployment.
2025/5/9
- Added an option to not use VAD (Voice Activity Detection), offering greater flexibility.
- Added a noise reduction model: MelBandRoformer.
- Added three Japanese anime fine-tuned Whisper models.
- Added ASR model: CrisperWhisper.
- Added English fine-tuned ASR model: Whisper-Large-v3.5-Distil.
- Added ASR model supporting Chinese (including some dialects): FireRedASR-AED-L.
- Removed the IPEX-LLM framework to enhance overall performance.
- Cancelled LLM quantization options, standardizing on the Q4F32 format.
- Improved accuracy of FSMN-VAD.
- Improved recognition accuracy of Paraformer.
- Improved recognition accuracy of SenseVoice.
- Improved inference speed of the Whisper series by over 10%.
- Supported the following large language models (LLMs) with ONNX Runtime 100% GPU operator deployment:
  - Qwen3-4B/8B
  - InternLM3-8B
  - Phi-4-mini-Instruct
  - Gemma3-4B/12B-it
- Expanded hardware support:
  - Intel OpenVINO
  - NVIDIA CUDA GPU
  - Windows DirectML GPU (supports integrated and discrete GPUs)

✨ Features

This project is built on ONNX Runtime framework.

Deoiser Support:
VAD Support:
- FSMN
- Faster_Whisper - Silero
- Official - Silero
- HumAware
- NVIDIA-NeMo-VAD-v2.0
- TEN-VAD
- Pyannote-Segmentation-3.0
  - You need to accept Pyannote's terms of use and download the Pyannote pytorch_model.bin file. Next, place it in the VAD/pyannote_segmentation folder.
ASR Support:
LLM Supports:
- Qwen-3: 4B, 8B
- InternLM-3: 8B
- Gemma-3-it: 4B, 12B
- Phi-4-Instruct: mini

📋 Setup Instructions

✅ Step 1: Install Dependencies

Run the following command in your terminal to install the latest required Python packages:
For Apple Silicon M-series chips, avoid installing onnxruntime-openvino, as it will cause errors.

conda install ffmpeg

pip install -r requirements.txt

📥 Step 2: Download Necessary Models

Download the required models from HuggingFace: Transcribe_and_Translate_Subtitles.

🖥️ Step 3: Download and Place `run.py`

Download the run.py script from this repository.
Place it in the Transcribe_and_Translate_Subtitles folder.

📁 Step 4: Place Target Videos in the Media Folder

Place the videos you want to transcribe and translate in the following directory. The application will process the videos one by one.:

Transcribe_and_Translate_Subtitles/Media

🚀 Step 5: Run the Application

Open your preferred terminal (PyCharm, CMD, PowerShell, etc.).
Execute the following command to start the application:

python run.py

Once the application starts, you will see a webpage open in your browser.

🛠️ Step 6: Fix Error (if encountered)

On the first run, you might encounter a Silero-VAD error. Simply restart the application, and it should be resolved.
On the first run, you might encounter a libc++1.so error. Run the following commands in the terminal, and they should resolve the issue.

sudo apt update
sudo apt install libc++1

💻 Step 7: Device Support

This project currently supports:
- Intel-OpenVINO-CPU-GPU-NPU
- Windows-AMD-GPU
- NVIDIA-GPU
- Apple-CPU
- AMD-CPU

🎉 Enjoy the Application!

Transcribe_and_Translate_Subtitles/Results/Subtitles

📌 To-Do List

Beam Search for ASR models.
Seed-X-PPO-7B with Beam Search
Belle-Whisper-ZH
Remove FSMN-VAD, Qwen, Gemma, Phi, InternLM. Only Gemma3-it-4B and Seed-X-PRO-7B are provided.
Upscale the Resolution of Video
Denoiser-MossFormer2-48K
AMD-ROCm Support
Real-Time Translate & Trascribe Video Player

性能 Performance

OS	Backend	Denoiser	VAD	ASR	LLM	Real-Time Factor test_video.mp4 7602 seconds
Ubuntu-24.04	CPU i3-12300	-	Silero	SenseVoiceSmall	-	0.08
Ubuntu-24.04	CPU i3-12300	GTCRN	Silero	SenseVoiceSmall	Qwen2.5-7B-Instruct	0.50
Ubuntu-24.04	CPU i3-12300	GTCRN	FSMN	SenseVoiceSmall	-	0.054
Ubuntu-24.04	CPU i3-12300	ZipEnhancer	FSMN	SenseVoiceSmall	-	0.39
Ubuntu-24.04	CPU i3-12300	GTCRN	Silero	Whisper-Large-V3	-	0.20
Ubuntu-24.04	CPU i3-12300	GTCRN	FSMN	Whisper-Large-V3-Turbo	-	0.148

转录和翻译字幕

🚨 重要提示

所有任务均在本地运行，无需连接互联网，确保最大程度的隐私保护。

✨ 功能

这个项目基于 ONNX Runtime 框架。

去噪器 (Denoiser) 支持：
语音活动检测（VAD）支持：
- FSMN
- Faster_Whisper - Silero
- 官方 - Silero
- HumAware
- NVIDIA-NeMo-VAD-v2.0
- TEN-VAD
- Pyannote-Segmentation-3.0
  - 需要接受Pyannote的使用条款，並自行下载 Pyannote pytorch_model.bin 文件，并将其放置在 VAD/pyannote_segmentation 文件夹中。
语音识别（ASR）支持：
大语言模型（LLM）支持：
- Qwen-3: 4B, 8B
- InternLM-3: 8B
- Gemma-3-it: 4B, 12B
- Phi-4-Instruct: mini

📋 设置指南

✅ 第一步：安装依赖项

在终端中运行以下命令来安装所需的最新 Python 包：
对于苹果 M 系列芯片，请不要安装 onnxruntime-openvino，否则会导致错误。

conda install ffmpeg

pip install -r requirements.txt

📥 第二步：下载必要的模型

从 HuggingFace 下载所需模型：Transcribe_and_Translate_Subtitles

🖥️ 第三步：下载并放置 `run.py`

从此项目的仓库下载 run.py 脚本。
将 run.py 放置在 Transcribe_and_Translate_Subtitles 文件夹中。

📁 第四步：将目标视频放入 Media 文件夹

将你想要转录和翻译的视频放置在以下目录，应用程序将逐个处理这些视频：

Transcribe_and_Translate_Subtitles/Media

🚀 第五步：运行应用程序

打开你喜欢的终端工具（PyCharm、CMD、PowerShell 等）。
运行以下命令来启动应用程序：

python run.py

应用程序启动后，你的浏览器将自动打开一个网页。

🛠️ 第六步：修复错误（如有）

首次运行时，你可能会遇到 Silero-VAD 错误。只需重启应用程序即可解决该问题。
首次运行时，你可能会遇到 libc++1.so 错误。在终端中运行以下命令，应该可以解决问题。

sudo apt update
sudo apt install libc++1

💻 第七步：支持设备

此项目目前支持:
- Intel-OpenVINO-CPU-GPU-NPU
- Windows-AMD-GPU
- NVIDIA-GPU
- Apple-CPU
- AMD-CPU

🎉 尽情享受应用程序吧！

Transcribe_and_Translate_Subtitles/Results/Subtitles

📌 待办事项

Beam Search for ASR models.
Seed-X-PPO-7B with Beam Search
Belle-Whisper-ZH
Remove FSMN-VAD, Qwen, Gemma, Phi, InternLM. 仅提供 Gemma3-it-4B 和 Seed-X-PRO-7B。
Upscale the Resolution of Video
Denoiser-MossFormer2-48K
支持 AMD-ROCm
实现实时视频转录和翻译播放器

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
screen		screen
Apple_requirements.txt		Apple_requirements.txt
Intel_requirements.txt		Intel_requirements.txt
LICENSE		LICENSE
NVIDIA_requirements.txt		NVIDIA_requirements.txt
README.md		README.md
Windows_requirements.txt		Windows_requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transcribe and Translate Subtitles

🚨 Important Note

Updates

✨ Features

📋 Setup Instructions

✅ Step 1: Install Dependencies

📥 Step 2: Download Necessary Models

🖥️ Step 3: Download and Place `run.py`

📁 Step 4: Place Target Videos in the Media Folder

🚀 Step 5: Run the Application

🛠️ Step 6: Fix Error (if encountered)

💻 Step 7: Device Support

🎉 Enjoy the Application!

📌 To-Do List

性能 Performance

转录和翻译字幕

🚨 重要提示

最近更新与功能

✨ 功能

📋 设置指南

✅ 第一步：安装依赖项

📥 第二步：下载必要的模型

🖥️ 第三步：下载并放置 `run.py`

📁 第四步：将目标视频放入 Media 文件夹

🚀 第五步：运行应用程序

🛠️ 第六步：修复错误（如有）

💻 第七步：支持设备

🎉 尽情享受应用程序吧！

📌 待办事项

About

Uh oh!

Releases

Packages

Languages

License

DakeQQ/Transcribe-and-Translate-Subtitles

Folders and files

Latest commit

History

Repository files navigation

Transcribe and Translate Subtitles

🚨 Important Note

Updates

✨ Features

📋 Setup Instructions

✅ Step 1: Install Dependencies

📥 Step 2: Download Necessary Models

🖥️ Step 3: Download and Place run.py

📁 Step 4: Place Target Videos in the Media Folder

🚀 Step 5: Run the Application

🛠️ Step 6: Fix Error (if encountered)

💻 Step 7: Device Support

🎉 Enjoy the Application!

📌 To-Do List

性能 Performance

转录和翻译字幕

🚨 重要提示

最近更新与功能

✨ 功能

📋 设置指南

✅ 第一步：安装依赖项

📥 第二步：下载必要的模型

🖥️ 第三步：下载并放置 run.py

📁 第四步：将目标视频放入 Media 文件夹

🚀 第五步：运行应用程序

🛠️ 第六步：修复错误（如有）

💻 第七步：支持设备

🎉 尽情享受应用程序吧！

📌 待办事项

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🖥️ Step 3: Download and Place `run.py`

🖥️ 第三步：下载并放置 `run.py`

Packages