Transform your PDF documents into audiobooks effortlessly using advanced text extraction and Kokoro TTS technology. This fork/variation of Kokoro allows for longer file generation and better handling of extracted PDF text.
-
Audio Sample
Listen to a short sample of the generated audiobook:
demo.mp4
-
Intelligent PDF Text Extraction
- Skips headers, footers, and page numbers.
- Optionally splits based on Table of Contents (TOC) or extracts the entire document.
-
Kokoro TTS Integration
- Generate natural-sounding audiobooks with the Kokoro-82M model.
- Easily select or swap out different
.pt
voicepacks.
-
User-Friendly GUI
- Modern interface with ttkbootstrap (theme selector, scrolled logs, progress bars).
- Pause/resume and cancel your audiobook generation anytime.
-
Configurable for Low-VRAM Systems
- Choose the chunk size for text to accommodate limited GPU resources.
- Switch to CPU if no GPU is available.
- Python 3.8+
- FFmpeg (for audio-related tasks on some systems).
- Torch (PyTorch for the Kokoro TTS model).
- Other Dependencies listed in
requirements.txt
.
-
Clone the Repository
git clone https://github.com/mateogon/pdf-narrator.git cd pdf-narrator
-
Create and Activate a Virtual Environment
python -m venv venv # On Linux/macOS: source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Python Dependencies
pip install --upgrade pip pip install -r requirements.txt
-
Download Kokoro Model
- Go to the Kokoro-82M Hugging Face page.
- Download the model checkpoint:
kokoro-v0_19.pth?download=true - Place this file in the
models/
directory (or a subdirectory) of your project.
Example:mkdir -p models mv /path/to/kokoro-v0_19.pth models/
-
Optional: Download Additional Voicepacks
- By default,
.pt
files (voicepacks) are inKokoro/voices/
. - If you have custom voicepacks, place them in
voices/your_custom_file.pt
.
- By default,
-
Install FFmpeg (if you need transcoding/combining WAV files)
- Ubuntu/Debian:
sudo apt-get install ffmpeg
- macOS:
brew install ffmpeg
- Windows: Download from the FFmpeg official site and follow the installation instructions.
- Ubuntu/Debian:
-
Launch the App
python main.py
-
Select a PDF
- Browse to choose your PDF file.
- Choose to extract by TOC-based chapters or by the entire book.
-
Configure Kokoro TTS Settings
- Select the
.pth
model (e.g.,models/kokoro-v0_19.pth
). - Pick a
.pt
voicepack (e.g.,voices/af_sarah.pt
). - Adjust chunk size if you have limited VRAM.
- Choose output audio format (
.wav
or.mp3
).
- Select the
-
Generate Audiobook
- Click Start Process.
- Track progress via logs, estimated time, and progress bars.
- Pause/Resume or Cancel at any point.
-
Enjoy Your Audiobook
- Open the output folder to find your generated
.wav
or.mp3
files.
- Open the output folder to find your generated
- Built atop PyMuPDF for parsing text.
- Cleans up headers, footers, page numbers, and multi-hyphen lines.
- Chapters vs. Whole:
- If TOC is found, you can split into smaller .txt files.
- Otherwise, extract the entire text into one file.
- Text Normalization & Phonemization
- Built-in text normalization for years, times, currency, etc.
- Token-Based Splitting
- Splits text into < 510 tokens per chunk to accommodate model constraints.
- Joins all chunked audio into a single final file.
- Voicepacks (.pt)
- Each voicepack provides a reference embedding for a given voice.
- Chunk Size
- If you run out of GPU memory, lower your chunk size from the default (2500) to something smaller (e.g., 1000 or 500).
- Device Selection
- Choose
CUDA
if you have a compatible GPU, orCPU
for CPU-only systems.
- Choose
- PDF Layout
- Extraction can vary if the PDF has complex formatting or unusual text flow.
- TTS Quality
- The generated speech depends on the Kokoro model’s training and quality.
- Processing Time
- Long PDFs with complex text can take a while to extract and convert.
We welcome contributions!
- Fork, branch, and submit a pull request.
- Report bugs via Issues.
This project is released under the MIT License.
Enjoy converting your PDFs into immersive audiobooks powered by Kokoro TTS!