stream-translator-gpt

Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and Whisper / Faster-Whisper for transcription.

This fork optimized the audio slicing logic based on VAD, introduced GPT API / Gemini API to support language translation beyond English, and supports input from the audio devices.

Try it on Colab:

Prerequisites

Linux or Windows:

Python >= 3.8 (Recommend >= 3.10)
Install CUDA on your system..
Install cuDNN to your CUDA dir if you want to use Faseter-Whisper.
Install PyTorch (with CUDA) to your Python.
Create a Google API key if you want to use Gemini API for translation. (Free 15 requests / minute)
Create a OpenAI API key if you want to use Whisper API for transcription or GPT API for translation.

If you are in Windows, you also need to:

Install and add ffmpeg to your PATH.
Install yt-dlp and add it to your PATH.

Installation

Install release version from PyPI (Recommend):

pip install stream-translator-gpt -U
stream-translator-gpt

or

Clone master version code from Github:

git clone https://github.com/ionic-bond/stream-translator-gpt.git
pip install -r ./stream-translator-gpt/requirements.txt
python3 ./stream-translator-gpt/translator.py

Usage

Transcribe live streaming (default use Whisper):

stream-translator-gpt {URL} --model large --language {input_language}
Transcribe by Faster Whisper:

stream-translator-gpt {URL} --model large --language {input_language} --use_faster_whisper
Transcribe by Whisper API:

stream-translator-gpt {URL} --language {input_language} --use_whisper_api --openai_api_key {your_openai_key}
Translate to other language by Gemini:

stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
Translate to other language by GPT:

stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --openai_api_key {your_openai_key}
Using Whisper API and Gemini at the same time:

stream-translator-gpt {URL} --model large --language ja --use_whisper_api --openai_api_key {your_openai_key} --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
Local video/audio file as input:

stream-translator-gpt /path/to/file --model large --language {input_language}
Computer microphone as input:

stream-translator-gpt device --model large --language {input_language}

Will use the system's default audio device as input.

If you want to use another audio input device, stream-translator-gpt device --print_all_devices get device index and then run the CLI with --device_index {index}.

If you want to use the audio output of another program as input, you need to enable stereo mix.
Sending result to Cqhttp:

stream-translator-gpt {URL} --model large --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}
Sending result to Discord:

stream-translator-gpt {URL} --model large --language {input_language} --discord_webhook_url {your_discord_webhook_url}
Saving result to a .srt subtitle file:

stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key} --hide_transcribe_result --output_timestamps --output_file_path ./result.srt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_PyPI.md

README_PyPI.md

stream-translator-gpt

Prerequisites

Installation

Usage

Files

README_PyPI.md

Latest commit

History

README_PyPI.md

File metadata and controls

stream-translator-gpt

Prerequisites

Installation

Usage