Transcribrr is a desktop tool that turns audio into text and then refines the output using OpenAI's GPT models. It works with audio or video files on your computer, YouTube videos via a provided URL, or recordings made directly in the app. While functional, this is a personal project that I work on in my free time and very much a work in progress.
- fast, accurate, local transcription with optional speaker detection (via the excellent whisperx library)
- GPT-4 for transcript processing & summarization
- Manageable transcription quality settings
- Preset prompt management for GPT processing
Before installing the application, ensure you have the following dependencies:
- Python 3.10
- Cuda 11.8 or higher (optional, though highly recommended, for hardware acceleration. Requires a supported Nvidia GPU.)
- ffmpeg
Clone the repository to your local machine:
git clone https://github.com/jbmiller10/transcribrr.git
cd transcribrr
python -m venv venv
.\venv\Scripts\activate
python -m venv venv
source venv/bin/activate
Install Torch w/ Cuda (optional, though highly recommended, for hardware acceleration. Requires an Nvidia GPU and cuda toolkit)
pip3 install torch~=2.0.0 torchaudio~=2.0.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Run the main script to start the application:
python main.py
Before usage, configure the application with your Hugging Face Access Token (optional, required for speaker detection/diarization) and OpenAI API keys through the 'Settings' menu. ll You can also adjust transcription quality, GPT model selection, max tokens, temperature, speaker detection settings, and your preset GPT prompts.
To enable Speaker Detection, you will need a Huggingface access token (generate here) that you can set in the settings menu. Additionally, you will need to accept the usage terms for the following models while logged into your huggingface account: Segmentation and Speaker-Diarization.
- Choose the mode of transcription (File Upload or YouTube URL).
- If using File Upload, select your video/audio file using the "Open Audio/Video File" button.
- If using the YouTube URL mode, paste the YouTube link into the corresponding field.
- Click the "Start Transcription" button to begin processing.
- After transcription, you can process the text with GPT-4 using the "Process with GPT-4" button after setting your prompts.