PDF Narrator (Kokoro Edition)

Transform your PDF documents into audiobooks effortlessly using advanced text extraction and Kokoro TTS technology. This fork/variation of Kokoro allows for longer file generation and better handling of extracted PDF text.

Demo

Screenshot
Check out the GUI in the screenshot below:
Audio Sample
Listen to a short sample of the generated audiobook:

demo.mp4

Features

Intelligent PDF Text Extraction
- Skips headers, footers, and page numbers.
- Optionally splits based on Table of Contents (TOC) or extracts the entire document.
Kokoro TTS Integration
- Generate natural-sounding audiobooks with the Kokoro-82M model.
- Easily select or swap out different .pt voicepacks.
User-Friendly GUI
- Modern interface with ttkbootstrap (theme selector, scrolled logs, progress bars).
- Pause/resume and cancel your audiobook generation anytime.
Configurable for Low-VRAM Systems
- Choose the chunk size for text to accommodate limited GPU resources.
- Switch to CPU if no GPU is available.

Prerequisites

Python 3.8+
FFmpeg (for audio-related tasks on some systems).
Torch (PyTorch for the Kokoro TTS model).
Other Dependencies listed in requirements.txt.

Installation

Clone the Repository

git clone https://github.com/mateogon/pdf-narrator.git
cd pdf-narrator

Create and Activate a Virtual Environment

python -m venv venv
# On Linux/macOS:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Install Python Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Download Kokoro Model
- Go to the Kokoro-82M Hugging Face page.
- Download the model checkpoint:
  kokoro-v0_19.pth?download=true
- Place this file in the models/ directory (or a subdirectory) of your project.
  Example:
```
mkdir -p models
mv /path/to/kokoro-v0_19.pth models/
```
Optional: Download Additional Voicepacks
- By default, .pt files (voicepacks) are in Kokoro/voices/.
- If you have custom voicepacks, place them in voices/your_custom_file.pt.
Install FFmpeg (if you need transcoding/combining WAV files)
- Ubuntu/Debian:
```
sudo apt-get install ffmpeg
```
- macOS:
```
brew install ffmpeg
```
- Windows: Download from the FFmpeg official site and follow the installation instructions.

Quick Start

Launch the App
```
python main.py
```
Select a PDF
- Browse to choose your PDF file.
- Choose to extract by TOC-based chapters or by the entire book.
Configure Kokoro TTS Settings
- Select the .pth model (e.g., models/kokoro-v0_19.pth).
- Pick a .pt voicepack (e.g., voices/af_sarah.pt).
- Adjust chunk size if you have limited VRAM.
- Choose output audio format (.wav or .mp3).
Generate Audiobook
- Click Start Process.
- Track progress via logs, estimated time, and progress bars.
- Pause/Resume or Cancel at any point.
Enjoy Your Audiobook
- Open the output folder to find your generated .wav or .mp3 files.

Technical Highlights

PDF Extraction

Built atop PyMuPDF for parsing text.
Cleans up headers, footers, page numbers, and multi-hyphen lines.
Chapters vs. Whole:
- If TOC is found, you can split into smaller .txt files.
- Otherwise, extract the entire text into one file.

Kokoro TTS

Text Normalization & Phonemization
- Built-in text normalization for years, times, currency, etc.
Token-Based Splitting
- Splits text into < 510 tokens per chunk to accommodate model constraints.
- Joins all chunked audio into a single final file.
Voicepacks (.pt)
- Each voicepack provides a reference embedding for a given voice.

Low-VRAM/Speed Tips

Chunk Size
- If you run out of GPU memory, lower your chunk size from the default (2500) to something smaller (e.g., 1000 or 500).
Device Selection
- Choose CUDA if you have a compatible GPU, or CPU for CPU-only systems.

Limitations

PDF Layout
- Extraction can vary if the PDF has complex formatting or unusual text flow.
TTS Quality
- The generated speech depends on the Kokoro model’s training and quality.
Processing Time
- Long PDFs with complex text can take a while to extract and convert.

Contributing

We welcome contributions!

Fork, branch, and submit a pull request.
Report bugs via Issues.

License

This project is released under the MIT License.

Enjoy converting your PDFs into immersive audiobooks powered by Kokoro TTS!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Kokoro		Kokoro
assets		assets
audiobooks		audiobooks
models		models
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
extract.py		extract.py
generate_audiobook.py		generate_audiobook.py
generate_audiobook_kokoro.py		generate_audiobook_kokoro.py
main.py		main.py
requirements.txt		requirements.txt
run_test.py		run_test.py
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Narrator (Kokoro Edition)

Demo

Features

Prerequisites

Installation

Quick Start

Technical Highlights

PDF Extraction

Kokoro TTS

Low-VRAM/Speed Tips

Limitations

Contributing

License

About

Releases

Packages

Languages

License

Decentralised-AI/pdf-narrator

Folders and files

Latest commit

History

Repository files navigation

PDF Narrator (Kokoro Edition)

Demo

Features

Prerequisites

Installation

Quick Start

Technical Highlights

PDF Extraction

Kokoro TTS

Low-VRAM/Speed Tips

Limitations

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages