LibriTTS-R Metadata Generator

Processes the LibriTTS-R speech synthesis dataset and generates a SQLite database schema with books, chapters, speakers, and transcriptions.

Quick Start

./run.sh

What it does

Downloads LibriTTS-R dataset from OpenSLR
Parses transcription files and metadata
Generates SQLite database schema
Converts WAV files to MP3 format

Requirements

Python 3.10+
FFmpeg
wget
sqlite3 (for local testing)

Output

dist/01_schema.sql - Database schema and initial data (books, speakers, chapters)
dist/02_transcriptions_*.sql - Transcription data split into chunks of 650 records
dist/03_indexes.sql - Database indexes for performance optimization
dist/ - MP3 audio files (128kbps)

Local Database Testing

You can test the generated SQL files locally using sqlite3:

# Import all SQL files into a local database
cat dist/01_schema.sql dist/02_transcriptions_*.sql dist/03_indexes.sql | sqlite3 libritts-r.db

# Verify the data
sqlite3 libritts-r.db "SELECT COUNT(*) FROM transcriptions; SELECT COUNT(*) FROM books; SELECT COUNT(*) FROM chapters; SELECT COUNT(*) FROM speakers;"

Cloudflare Deployment

To deploy to Cloudflare D1 (database) and R2 (audio storage):

./publish.sh

This script will:

Create D1 database libritts-r if it doesn't exist
Execute all SQL files to populate the database with metadata
Create R2 bucket libritts-r if it doesn't exist
Upload all MP3 files to R2, preserving the directory structure

The deployment creates a complete cloud setup with:

Structured metadata in D1 (books, chapters, speakers, transcriptions)
Audio files in R2 (MP3 files organized by dataset subset and chapter)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generator.py		generator.py
publish.sh		publish.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LibriTTS-R Metadata Generator

Quick Start

What it does

Requirements

Output

Local Database Testing

Cloudflare Deployment

License

About

Uh oh!

Languages

License

silvioprog/libritts-r-metadata

Folders and files

Latest commit

History

Repository files navigation

LibriTTS-R Metadata Generator

Quick Start

What it does

Requirements

Output

Local Database Testing

Cloudflare Deployment

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages