Skip to content

DakeQQ/Transcribe-and-Translate-Subtitles

Repository files navigation

Transcribe and Translate Subtitles

🚨 Important Note

  • Every task runs locally without internet, ensuring maximum privacy.

Updates

  • 2025/7/5
    • Added a noise reduction model: MossFormerGAN_SE_16K
  • 2025/6/11
    • Added HumAware-VAD, NVIDIA-NeMo-VAD, TEN-VAD
  • 2025/6/3
    • Added Dolphin ASR model to support Asian languages.
  • 2025/5/13
    • Added Float16/32 ASR models to support CUDA/DirectML GPU usage. These models can achieve >99% GPU operator deployment.
  • 2025/5/9
    • Added an option to not use VAD (Voice Activity Detection), offering greater flexibility.
    • Added a noise reduction model: MelBandRoformer.
    • Added three Japanese anime fine-tuned Whisper models.
    • Added ASR model: CrisperWhisper.
    • Added English fine-tuned ASR model: Whisper-Large-v3.5-Distil.
    • Added ASR model supporting Chinese (including some dialects): FireRedASR-AED-L.
    • Removed the IPEX-LLM framework to enhance overall performance.
    • Cancelled LLM quantization options, standardizing on the Q4F32 format.
    • Improved accuracy of FSMN-VAD.
    • Improved recognition accuracy of Paraformer.
    • Improved recognition accuracy of SenseVoice.
    • Improved inference speed of the Whisper series by over 10%.
    • Supported the following large language models (LLMs) with ONNX Runtime 100% GPU operator deployment:
      • Qwen3-4B/8B
      • InternLM3-8B
      • Phi-4-mini-Instruct
      • Gemma3-4B/12B-it
    • Expanded hardware support:
      • Intel OpenVINO
      • NVIDIA CUDA GPU
      • Windows DirectML GPU (supports integrated and discrete GPUs)

✨ Features

This project is built on ONNX Runtime framework.


📋 Setup Instructions

✅ Step 1: Install Dependencies

  • Run the following command in your terminal to install the latest required Python packages:
  • For Apple Silicon M-series chips, avoid installing onnxruntime-openvino, as it will cause errors.
conda install ffmpeg

pip install -r requirements.txt

📥 Step 2: Download Necessary Models

🖥️ Step 3: Download and Place run.py

  • Download the run.py script from this repository.
  • Place it in the Transcribe_and_Translate_Subtitles folder.

📁 Step 4: Place Target Videos in the Media Folder

  • Place the videos you want to transcribe and translate in the following directory. The application will process the videos one by one.:
Transcribe_and_Translate_Subtitles/Media

🚀 Step 5: Run the Application

  • Open your preferred terminal (PyCharm, CMD, PowerShell, etc.).
  • Execute the following command to start the application:
python run.py
  • Once the application starts, you will see a webpage open in your browser. screenshot

🛠️ Step 6: Fix Error (if encountered)

  • On the first run, you might encounter a Silero-VAD error. Simply restart the application, and it should be resolved.
  • On the first run, you might encounter a libc++1.so error. Run the following commands in the terminal, and they should resolve the issue.
sudo apt update
sudo apt install libc++1

💻 Step 7: Device Support

  • This project currently supports:
    • Intel-OpenVINO-CPU-GPU-NPU
    • Windows-AMD-GPU
    • NVIDIA-GPU
    • Apple-CPU
    • AMD-CPU

🎉 Enjoy the Application!

Transcribe_and_Translate_Subtitles/Results/Subtitles

📌 To-Do List


性能 Performance

OS Backend Denoiser VAD ASR LLM Real-Time Factor
test_video.mp4
7602 seconds
Ubuntu-24.04 CPU
i3-12300
- Silero SenseVoiceSmall - 0.08
Ubuntu-24.04 CPU
i3-12300
GTCRN Silero SenseVoiceSmall Qwen2.5-7B-Instruct 0.50
Ubuntu-24.04 CPU
i3-12300
GTCRN FSMN SenseVoiceSmall - 0.054
Ubuntu-24.04 CPU
i3-12300
ZipEnhancer FSMN SenseVoiceSmall - 0.39
Ubuntu-24.04 CPU
i3-12300
GTCRN Silero Whisper-Large-V3 - 0.20
Ubuntu-24.04 CPU
i3-12300
GTCRN FSMN Whisper-Large-V3-Turbo - 0.148

转录和翻译字幕

🚨 重要提示

  • 所有任务均在本地运行,无需连接互联网,确保最大程度的隐私保护。

最近更新与功能

  • 2025/7/5
    • 新增 降噪 MossFormerGAN_SE_16K
  • 2025/6/11
    • 新增 HumAware-VAD, NVIDIA-NeMo-VAD, TEN-VAD。
  • 2025/6/3
    • 新增 Dolphin ASR 模型以支持亚洲语言。
  • 2025/5/13
    • 新增 Float16/32 ASR 模型,支持 CUDA/DirectML GPU 使用。这些模型可实现 >99% 的 GPU 算子部署率。
  • 2025/5/9
    • 新增 不使用 VAD(语音活动检测)的选项,提供更多灵活性。
    • 新增降噪模型:MelBandRoformer
    • 新增三款日语动漫微调Whisper模型。
    • 新增ASR模型:CrisperWhisper
    • 新增英语微调ASR模型:Whisper-Large-v3.5-Distil
    • 新增支持中文(包括部分方言)的ASR模型:FireRedASR-AED-L
    • 移除IPEX-LLM框架,提升整体性能。
    • 取消LLM量化选项,统一采用Q4F32格式。
    • 改进了FSMN-VAD的准确率。
    • 改进了Paraformer的识别准确率。
    • 改进了SenseVoice的识别准确率。
    • 改进了Whisper系列的推理速度10%+。
    • 支持以下大语言模型(LLM),实现ONNX Runtime 100% GPU算子部署
      • Qwen3-4B/8B
      • InternLM3-8B
      • Phi-4-mini-Instruct
      • Gemma3-4B/12B-it
    • 扩展硬件支持:
      • Intel OpenVINO
      • NVIDIA CUDA GPU
      • Windows DirectML GPU(支持集成显卡和独立显卡)

✨ 功能

这个项目基于 ONNX Runtime 框架。


📋 设置指南

✅ 第一步:安装依赖项

  • 在终端中运行以下命令来安装所需的最新 Python 包:
  • 对于苹果 M 系列芯片,请不要安装 onnxruntime-openvino,否则会导致错误。
conda install ffmpeg

pip install -r requirements.txt

📥 第二步:下载必要的模型

🖥️ 第三步:下载并放置 run.py

  • 从此项目的仓库下载 run.py 脚本。
  • run.py 放置在 Transcribe_and_Translate_Subtitles 文件夹中。

📁 第四步:将目标视频放入 Media 文件夹

  • 将你想要转录和翻译的视频放置在以下目录,应用程序将逐个处理这些视频:
Transcribe_and_Translate_Subtitles/Media

🚀 第五步:运行应用程序

  • 打开你喜欢的终端工具(PyCharm、CMD、PowerShell 等)。
  • 运行以下命令来启动应用程序:
python run.py
  • 应用程序启动后,你的浏览器将自动打开一个网页。
    screenshot

🛠️ 第六步:修复错误(如有)

  • 首次运行时,你可能会遇到 Silero-VAD 错误。只需重启应用程序即可解决该问题。
  • 首次运行时,你可能会遇到 libc++1.so 错误。在终端中运行以下命令,应该可以解决问题。
sudo apt update
sudo apt install libc++1

💻 第七步:支持设备

  • 此项目目前支持:
    • Intel-OpenVINO-CPU-GPU-NPU
    • Windows-AMD-GPU
    • NVIDIA-GPU
    • Apple-CPU
    • AMD-CPU

🎉 尽情享受应用程序吧!

Transcribe_and_Translate_Subtitles/Results/Subtitles

📌 待办事项


About

Transcribe subtitles and translate them offline with ease.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages