Vocazee

A voice cloning and text-to-speech application that can generate speech in any voice.

Technical Details

Frontend: React
Backend: FastAPI
Text-to-speech: Tortoise TTS

Installation

Clone the repository:

git clone https://github.com/taeefnajib/vocazee.git
cd vocazee

Build and start the containers:

docker compose up --build

Note: The first build will take some time as it downloads necessary AI models (>1GB). This is a one-time setup.

Access the application:

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs

Usage

Training a Custom Voice

From the Web Interface:
- Go to http://localhost:3000
- Switch to the "Train Custom Voice" tab
- Enter a name for your voice
- Record a clear audio file of your voice reading the provided text
- Click "Train Voice"
- Wait for the training to complete (usually takes 1-2 minutes)

From Command Line (Advance): Go to server directory. Make sure you create and activate a virtual environment and install dependencies by running pip install -r requirements.txt. Now follow the steps:

# 1. First, process your audio file
python generate_voice.py --input_file path/to/your/audio.wav --output_dir voices/your_voice_name

# 2. Generate voice embeddings
python save_embeddings.py --voice_dir voices/your_voice_name

# 3. Cache voice latents for faster generation
python cache_voice_latents.py --voice_dir voices/your_voice_name

Tips for best results:

Use high-quality audio with minimal background noise
Record in a quiet environment
Speak clearly and at a natural pace
Aim for at least 120 seconds of audio

Generating Speech

From the Web Interface:
- Go to http://localhost:3000
- Select a trained voice from the dropdown
- Enter or paste the text you want to convert to speech
- Toggle "High Quality" if desired (slower but better quality)
- Click "Generate Speech"
- Once complete, use the audio player to listen or download the generated audio

Using the API directly:

curl -X POST http://localhost:8000/generate-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text here",
    "voice_name": "your_voice_name",
    "high_quality": false
  }'

Voice Directory Structure

Each voice in the voices directory should have the following structure:

voices/
└── your_voice_name/
    ├── original.wav         # Original audio file
    ├── chunks               # Processed audio chunks
    ├── voice_latents.pth    # Cached voice latents
    └── embeddings.pt        # Voice embeddings

API Endpoints

POST /create-voice: Train a new voice
GET /voices: List all available voices
POST /generate-speech: Generate speech from text
GET /audio/{generation_id}/{part}: Get generated audio file

Troubleshooting

If the server is slow on first request:
- This is normal as models are being loaded
- Subsequent requests will be faster
If voice training fails:
- Ensure audio is clear and has minimal background noise
- Try recording a longer sample
- Check if the audio format is supported (WAV recommended)
If speech generation is stuck:
- Check server logs using docker logs vocazee-server-1
- Ensure the voice model exists and is properly trained
- Try with a shorter text first

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
client		client
server		server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vocazee

Technical Details

Installation

Usage

Training a Custom Voice

Generating Speech

Voice Directory Structure

API Endpoints

Troubleshooting

License

About

Languages

License

taeefnajib/Vocazee

Folders and files

Latest commit

History

Repository files navigation

Vocazee

Technical Details

Installation

Usage

Training a Custom Voice

Generating Speech

Voice Directory Structure

API Endpoints

Troubleshooting

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages