This script automates the process of downloading a video using yt-dlp, converting it to audio using ffmpeg, transcribing the audio into text with Faster-Whisper, and compiling the text into a final document.
Ensure you have the following installed:
- Install
yt-dlp
(for downloading YouTube videos):pip install yt-dlp
- Install
ffmpeg
(for audio conversion):- Windows: Download ffmpeg and add it to PATH.
- Linux/macOS: Install via package manager:
sudo apt install ffmpeg # Debian-based (Ubuntu) brew install ffmpeg # macOS (Homebrew)
- Install
faster-whisper
(for audio transcription):pip install faster-whisper
- Install additional dependencies:
pip install torch numpy tqdm
You can download the video using two different methods:
If the video is public, you can download it directly without cookies:
yt-dlp -o "<OUTPUT_VIDEO_PATH>" "<YOUTUBE_VIDEO_URL>"
Example:
yt-dlp -o "C:\Users\mohan\OneDrive\Desktop\video.mp4" https://www.youtube.com/watch?v=Kbn2ab0-sGE
If the video requires login, you must pass your cookies file:
yt-dlp --cookies "<PATH_TO_COOKIES_FILE>" -o "<OUTPUT_VIDEO_PATH>" "<YOUTUBE_VIDEO_URL>"
Example:
yt-dlp --cookies "C:\Users\mohan\OneDrive\Desktop\cookies.txt" -o "C:\Users\mohan\OneDrive\Desktop\video.mp4" https://www.youtube.com/watch?v=Kbn2ab0-sGE
If you need to download private or unlisted videos, you must extract cookies from your browser. Follow these steps:
-
Install the extension:
- Chrome: Get cookies.txt extension
- Firefox: Get cookies.txt extension
-
Go to YouTube and log in to your account.
-
Open the video you want to download.
-
Click on the cookies.txt extension and download the cookies file.
-
Save the cookies file, e.g.,
C:\Users\mohan\OneDrive\Desktop\cookies.txt
. -
Use it with
yt-dlp
:yt-dlp --cookies "C:\Users\mohan\OneDrive\Desktop\cookies.txt" -o "C:\Users\mohan\OneDrive\Desktop\video.mp4" https://www.youtube.com/watch?v=Kbn2ab0-sGE
Convert the downloaded file (<OUTPUT_VIDEO_PATH>
) into an MP3 or WAV file.
-
Convert to MP3:
ffmpeg -i "<OUTPUT_VIDEO_PATH>" -vn -acodec libmp3lame -q:a 2 "<OUTPUT_AUDIO_PATH>.mp3"
-
Convert to WAV (for better transcription accuracy):
ffmpeg -i "<OUTPUT_VIDEO_PATH>" -vn -acodec pcm_s16le -ar 16000 -ac 1 "<OUTPUT_AUDIO_PATH>.wav"
Example:
ffmpeg -i "C:\Users\mohan\OneDrive\Desktop\video.mp4" -vn -acodec libmp3lame -q:a 2 "C:\Users\mohan\OneDrive\Desktop\audio.mp3"
Run Faster-Whisper to transcribe the audio file.
python faster-whisper/whisper.py --model <WHISPER_MODEL_SIZE> --device <DEVICE> --output_dir "<OUTPUT_TEXT_DIR>" "<OUTPUT_AUDIO_PATH>.mp3"
<WHISPER_MODEL_SIZE>
: Choose fromtiny
,base
,small
,medium
,large-v2
.<DEVICE>
: Usecuda
for GPU acceleration orcpu
for standard processing.
Example:
python faster-whisper/whisper.py --model small --device cuda --output_dir "C:\Users\mohan\OneDrive\Desktop\output" "C:\Users\mohan\OneDrive\Desktop\audio.mp3"
The transcription results will be saved in the output/
folder.
Use convert.py
to merge all transcribed text into a final document.
python convert.py "<OUTPUT_TEXT_DIR>"
Example:
python convert.py "C:\Users\mohan\OneDrive\Desktop\output"
This script will:
- Read all transcribed segments from
output/
. - Merge them into a single
.txt
file.
The final transcribed text will be saved as:
<OUTPUT_TEXT_DIR>/final_transcript.txt
yt-dlp --cookies "C:\Users\mohan\OneDrive\Desktop\cookies.txt" -o "C:\Users\mohan\OneDrive\Desktop\video.mp4" https://www.youtube.com/watch?v=Kbn2ab0-sGE
ffmpeg -i "C:\Users\mohan\OneDrive\Desktop\video.mp4" -vn -acodec libmp3lame -q:a 2 "C:\Users\mohan\OneDrive\Desktop\audio.mp3"
python faster-whisper/whisper.py --model small --device cuda --output_dir "C:\Users\mohan\OneDrive\Desktop\output" "C:\Users\mohan\OneDrive\Desktop\audio.mp3"
python convert.py "C:\Users\mohan\OneDrive\Desktop\output"
If the video is public, remove the --cookies
flag and run:
yt-dlp -o "C:\Users\mohan\OneDrive\Desktop\video.mp4" https://www.youtube.com/watch?v=Kbn2ab0-sGE
- Replace placeholders (
<...>
) with actual file paths. - Use the cookies method only if the video is private or unlisted.
- Make sure your cookies file is up to date for YouTube authentication.
- Use a larger Whisper model (
medium
,large-v2
) for better accuracy. - If you don’t have a GPU, set
--device cpu
in the transcription step.