Speakoro

Kokoro library and CLI tool in Rust. Batteries included, just a single binary, no runtime dependencies¹.

🚨 This project is currently usable, but pretty barebones and far from finished. 🚨
Significant changes can happen.

Overview

Use Kokoro in your terminal with an everything-included binary, or easily embed it in your project as a library.

In short, this project embeds a Kokoro onnx file and various Kokoro voice files (currently not all of them), and runs the model using the ort crate, which is statically linked, meaning that everything is included in the final binary.

The CLI tool additionally uses Phonemoro as its phonemizer, which also embeds everything it needs, resulting in a fully functioning text-to-speech system within a single binary.

Features:

easy to build and use
no special runtime dependencies
single binary with everything embedded
portable
doesn't use espeak, so none of the licensing issues
suitable for mobile use²

Usage

Since this project is based on Kokoro, a model file (onnx) and the voice files are needed. You can either download them manually, or enable the download-data feature and automatically download them during build. Both ways are described.

As a Library

Add speakoro to your project:

Easy Way (Recommended)
- Add speakoro directly to your project, with the download-data flag enabled:
```
$ cargo add --git https://github.com/lastleon/speakoro speakoro -F download-data
```

⚠️ Warning:

This automatically downloads the necessary files from Huggingface. If you don't want that, proceed with Harder Way.

Harder Way
Use this only if you're uncomfortable downloading from the internet, or you want to use your own data.
- Clone this repository to a location outside your project and enter it:
```
$ git clone https://github.com/lastleon/speakoro && cd speakoro
```
- Create the onnx model and voice directories:
```
$ mkdir -p data/{onnx,voice}
```
- Download the desired model and the english voices from onnx-community/Kokoro-82M-v1.0-ONNX, place the model in data/onnx, and place the voices in data/voices.
- Back in your project, add speakoro as a dependency:
```
$ cargo add --path <path-to-the-cloned-speakoro-repo> speakoro
```

Set the SPEAKORO_MODEL_FILE environment variable to choose which model should used (and downloaded, if enabled). You can either:

Set it within the .cargo/config.toml file in your project:

[env]
# See https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX/tree/main/onnx for a list of available options. Note that not all models work, you have to test that out.
# Recommendations: model.onnx, model_fp16.onnx, model_uint8.onnx
SPEAKORO_MODEL_FILE = "model_uint8.onnx"

Or set the variable during the build:

$ SPEAKORO_MODEL_FILE=model_uint8.onnx cargo build --release

Use the library like so:

use speakoro::{Kokoro, KokoroVoice};
use anyhow::Result;

fn main() -> Result<()> {
    let kokoro = Kokoro::new()?;
    let audio = kokoro.phonemes2audio("həlˈO wˈɜɹld", KokoroVoice::AF_BELLA, 1f32)?;
    speakoro::utils::write_to_wav(audio, "audio.wav")?;

    Ok(())
}

💡 Note:

To see an end-to-end example, go to the speakoro-cli crate. It utilizes the closely related Phonemoro project as the phonemizer.

As a CLI tool

This uses Phonemoro as the phonemizer.

⚠️ Warning:

Building the CLI tool requires downloading the necessary data for both speakoro and phonemoro.

Since building phonemoro without downloading the data needs some setup in a directory outside this project, meaning it is kind of a involved process, the choice was made to not provide a feature flag for a build without downloading data. Having that flag would only be meaningful if phonemoro was also built without downloading data, but for that, it would need to be added as a dependency in a different way. At the end of this section, an offline build is described.

The onnx model and voice files are downloaded from Huggingface, the data phonemoro needs is downloaded from the releases page of phonemoro.

Clone this repository:

$ git clone https://github.com/lastleon/speakoro

(Optional): Change the Kokoro model you want to use. For that, follow step 2 of Usage > As a Library. By default, model_uint8.onnx is used.
Build speakoro-cli:

$ cargo build -p speakoro-cli --release

Usage:

$ ./target/release/speakoro-cli --help
Usage: speakoro-cli [OPTIONS] <text>

Arguments:
  <text>  Pass the text that should be converted to speech. If the flag --phonemes is set, this will be interpreted as raw phonemes.

Options:
  -v, --voice <voice>  Set which voice should be used to generate audio. [default: af_bella] [possible values: af_heart, af_bella, af_nicole, af_aoede, bf_emma, bf_isabella, am_adam, am_fenrir, bm_daniel]
  -p, --phonemes       If set, the passed text will be interpreted as phonemes.
  -o, --out <out>      Set filepath to where the audio will be written to. Note that the output format is WAV. [default: audio.wav]
  -h, --help           Print help
  -V, --version        Print version

Offline Build:

Clone this repository and add the necessary data as described in Usage > As a Library (Harder Way)
Go to Phonemoro, and follow the offline build instructions (Usage (lib) > Harder Way) to use it as a library, but don't add it to speakoro-cli yet
Go to speakoro-cli and replace the phonemoro dependency like so:

$ cargo rm phonemoro && cargo add --path <path-to-the-cloned-phonemoro-repo> phonemoro

Remove the download-data feature from speakoro:

$ cargo rm speakoro && cargo add --path .. speakoro

Optionally change the Kokoro model like described before, then build speakoro-cli:

$ cargo build -p speakoro-cli --release

Cross Compiling

TODO (main limitation is ort, which you might need to manually build)

Acknowledgements

hexgrad/Kokoro: The model this library is based on.
onnx-community/Kokoro-82M-v1.0-ONNX: The quantized and to onnx converted models this library uses.
lucasjinreal/Kokoros: Another "Kokoro in Rust" project I recently found out about. It has more features and almost certainly better phonemization, since it uses espeak as a backend. However, it needs Python (and possibly PyTorch) for the installation, requires vendored espeak, Kokoro onnx models and voice data in external directories.
So, if you need any of the additional features Kokoros provides, or better phonemization, use Kokoros. If you need a self contained binary, want easier installation or usage as a library, or don't want to use espeak because of licensing issues, use speakoro.

Attribution

This project utilizes data from onnx-community/Kokoro-82M-v1.0-ONNX, licensed under the Apache License 2.0.

License

speakoro is licensed under the MIT License.

Apart from the usual suspects, such as libc.so. ↩
Depending on the platform, you might need to build the onnx runtime yourself, though. Also yes, this is kind of fast enough to properly run on a phone! :) ↩

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cargo		.cargo
.github/workflows		.github/workflows
speakoro-cli		speakoro-cli
speakoro-macros		speakoro-macros
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speakoro

Overview

Usage

As a Library

As a CLI tool

Cross Compiling

Acknowledgements

Attribution

License

About

Uh oh!

Releases

Languages

License

lastleon/speakoro

Folders and files

Latest commit

History

Repository files navigation

Speakoro

Overview

Usage

As a Library

As a CLI tool

Cross Compiling

Acknowledgements

Attribution

License

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages