Skip to content

TTSFM is a reverse-engineered API server that mirrors OpenAI's TTS service, providing a compatible interface for text-to-speech conversion with multiple voice options.

License

Notifications You must be signed in to change notification settings

dbccccccc/ttsfm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

69 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TTSFM

Docker Pulls License GitHub Stars

โš ๏ธ Disclaimer
This project is for learning & testing purposes only. For production use, please use the official OpenAI TTS service.

๐Ÿšจ IMPORTANT DEVELOPMENT NOTICE ๐Ÿšจ
โš ๏ธ The v2 branch is currently under active development and is not recommended for production use. ๐Ÿ“š For stable documentation and usage, please refer to the v1 documentation.

English | ไธญๆ–‡

๐ŸŒŸ Project Overview

TTSFM is a API server that's fully compatible with OpenAI's Text-to-Speech (TTS) API format.

๐ŸŽฎ Try it now: Official Demo

๐Ÿ—๏ธ Project Structure

ttsfm/
โ”œโ”€โ”€ app.py              # Main Flask application
โ”œโ”€โ”€ celery_worker.py    # Celery configuration and tasks
โ”œโ”€โ”€ requirements.txt    # Python dependencies
โ”œโ”€โ”€ static/            # Frontend resources
โ”‚   โ”œโ”€โ”€ index.html     # English interface
โ”‚   โ”œโ”€โ”€ index_zh.html  # Chinese interface
โ”‚   โ”œโ”€โ”€ script.js      # Frontend JavaScript
โ”‚   โ””โ”€โ”€ styles.css     # Frontend styles
โ”œโ”€โ”€ voices/            # Voice samples
โ”œโ”€โ”€ Dockerfile         # Docker configuration
โ”œโ”€โ”€ docker-entrypoint.sh # Docker startup script
โ”œโ”€โ”€ .env.example       # Environment variables template
โ”œโ”€โ”€ .env              # Environment variables
โ”œโ”€โ”€ .gitignore        # Git ignore rules
โ”œโ”€โ”€ LICENSE           # MIT License
โ”œโ”€โ”€ README.md         # English documentation
โ”œโ”€โ”€ README_CN.md      # Chinese documentation
โ”œโ”€โ”€ test_api.py       # API test suite
โ”œโ”€โ”€ test_queue.py     # Queue test suite
โ””โ”€โ”€ .github/          # GitHub workflows

๐Ÿš€ Quick Start

System Requirements

  • Python 3.13 or higher
  • Redis server
  • Docker (optional)

Using Docker (Recommended)

# Pull the latest image
docker pull dbcccc/ttsfm:latest

# Run the container
docker run -d \
  --name ttsfm \
  -p 7000:7000 \
  -p 6379:6379 \
  -v $(pwd)/voices:/app/voices \
  dbcccc/ttsfm:latest

Manual Installation

  1. Clone the repository:
git clone https://github.com/dbccccccc/ttsfm.git
cd ttsfm
  1. Install dependencies:
pip install -r requirements.txt
  1. Start Redis server:
# On Windows
redis-server

# On Linux/macOS
sudo service redis-server start
  1. Start Celery worker:
celery -A celery_worker.celery worker --pool=solo -l info
  1. Start the server:
# Development (not recommended for production)
python app.py

# Production (recommended)
waitress-serve --host=0.0.0.0 --port=7000 app:app

Environment Variables

Copy .env.example to .env and modify as needed:

cp .env.example .env

๐Ÿ”ง Configuration

Server Configuration

  • HOST: Server host (default: 0.0.0.0)
  • PORT: Server port (default: 7000)
  • VERIFY_SSL: SSL verification (default: true)
  • MAX_QUEUE_SIZE: Maximum queue size (default: 100)
  • RATE_LIMIT_REQUESTS: Rate limit requests per window (default: 30)
  • RATE_LIMIT_WINDOW: Rate limit window in seconds (default: 60)

Celery Configuration

  • CELERY_BROKER_URL: Redis broker URL (default: redis://localhost:6379/0)
  • CELERY_RESULT_BACKEND: Redis result backend URL (default: redis://localhost:6379/0)

๐Ÿ“š API Documentation

Text-to-Speech

POST /v1/audio/speech

Request body:

{
  "input": "Hello, world!",
  "voice": "alloy",
  "response_format": "mp3",
  "instructions": "Speak in a cheerful tone"
}

Parameters

  • input (required): The text to convert to speech
  • voice (required): The voice to use. Supported voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse
  • response_format (optional): The format of the audio output. Default: mp3. Supported formats: mp3, opus, aac, flac, wav, pcm
  • instructions (optional): Additional instructions for voice modulation

Response

  • Success: Returns audio data with appropriate content type
  • Error: Returns JSON with error message and status code

Queue Status

GET /api/queue-size

Response:

{
  "queue_size": 5,
  "max_queue_size": 100
}

Voice Samples

GET /api/voice-sample/{voice}

Parameters

  • voice (required): The voice to get a sample for. Must be one of: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

Response

  • Success: Returns MP3 audio sample
  • Error: Returns JSON with error message and status code

Version

GET /api/version

Response:

{
  "version": "v2.0.0-alpha1"
}

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI for the TTS API format
  • Flask for the web framework
  • Celery for task queue management
  • Waitress for the production WSGI server