Skip to content

Latest commit

 

History

History
80 lines (61 loc) · 1.99 KB

README.md

File metadata and controls

80 lines (61 loc) · 1.99 KB

Whisper transcribe for vods

CUDA is highly recommended for this! CPU is about 3x slower. Read more about speed comparison here: https://github.com/guillaumekln/faster-whisper#benchmark

Installation

  1. Create a python virtual environment
python -m venv venv

source venv/bin/activate # linux, or...
venv/Scripts/activate # for windows
  1. Install pip packages
pip install -r requirements.txt

Head over to pytorch.org, select:

PyTorch Build Stable
Your OS xxxx
Package Pip
Language Python
Compute Platform CUDA <latest version>

Then run the given command to install pytorch.

  1. Copy SAMPLE_config.json to config.json and change the api endpoints.

  2. Make sure to have ffmpeg installed.

Run

Just running the main script will mostly do all you need. Transcribed scripts will be saved in transcripts/.

python main.py -e prod transcribe

Using the large whisper model (default) will result in the best speech to text and requires ~6GB GPU memory. Use python main.py transcribe -h so see all available models.

usage: Wubbl0rz Archiv Transcribe [-h] [-c CONFIG] -e {prod,dev} [-o OUTPUT]
                                  {transcribe,post} ...

positional arguments:
  {transcribe,post}     Available commands
    transcribe          Run whisper to transcribe vods to text
    post                Post available transcriptions

options:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        Path to config.json
  -e {prod,dev}, --environment {prod,dev}
                        Target environment
  -o OUTPUT, --output OUTPUT
                        Output directory for transcripts