Phonemoro

Fast, low-latency and highly portable phonemizer.

Transcribes to IPA. Created for the use with Kokoro, but not limited to that. Suitable for edge devices. Easy deployment, since all data is statically included in the binary, so no dependencies or other files needed.

Currently only support for US english.

🚨 WIP, so a lot can still change 🚨

⚠️ This project was renamed from phonemizer-rs to phonemoro. See #1 (comment) for further information.

Overview

This project started because I needed a phonemizer for use with Kokoro on my phone, and because the alternatives were not a good fit in one way or another¹. As such, there are four key requirements this needed to fulfill:

be fast enough
have low enough latency
produce IPA phonemes that are compatible with Kokoro, i.e. do not sound weird
be easy to use and cross compile

With that in mind, this is how the works:

Tokenization: First, the input text is tokenized using Logos for easier preprocessing and phonemization logic.
Lookup: Then, the relevant words are looked up in the grapheme-to-phoneme datasets used by Misaki, the phonemizer behind Kokoro. The datasets are preprocessed and then statically embedded in the binary as a phf_map from the phf crate.
Fallback: If the lookup of a word has no result, then the word is phonemized with a finite state transducer (FST) trained with Phonetisaurus on the previously mentioned datasets. The phonemizations produced by the FST are not that great, but it is fast. phonetisaurus-g2p was created to be an easy to use wrapper for that.

Usage (lib)

This library requires data that needs to be prepared. You can either do that manually, or you can enable a feature and automatically download the prepared data from the releases page.

By default, automatically downloading the data is disabled.

Easy Way (Recommended)

Add this library to your crate, with the download-data feature enabled:

$ cargo add --git https://github.com/lastleon/phonemoro phonemoro -F download-data

⚠️ Warning: This downloads the release.zip file from the releases page on GitHub, unzips it, and moves the contents to the appropriate directory.

This only works from version 0.3.0 onwards. You should only ever use the latest version of the library anyway, for now.

Use the library like so:

use phonemoro::en::phonemizer::EnPhonemizer;

fn main() {
    let phonemizer = EnPhonemizer::new().unwrap();

    let result = phonemizer.phonemize("hello world").unwrap();
    assert_eq!(result, "həlˈO wˈɜɹld")
}

Harder Way

Use this only if you're uncomfortable downloading from the internet, or you want to use your own data.

Clone this repository:

$ git clone https://github.com/lastleon/phonemoro

Prepare the data. Currently, only US english is supported, so the instructions focus on that. For that, go to the data-preparation directory, and follow the instructions there. Then, copy the artifacts (model.fst, us_gold.json and us_silver.json) to src/en/data. Note that this requires additional dependencies, and is currently only supported on Linux and maybe MacOS.
Now, go to your own crate, and add phonemoro as a dependency:

$ cargo add --path <path-to-the-cloned-phonemoro-repo>

Use the library like shown in the previous section.

Usage (cli)

Clone this repository:

$ git clone https://github.com/lastleon/phonemoro

Build the cli tool:

Easy Way: Build the cli tool with the download-data feature enabled:
```
$ cargo build -p phonemoro-cli --release -F download-data
```
⚠️ Warning: The same warnings as in Usage (lib) > Easy Way apply here.
Harder Way: Follow step 2 of Usage (lib) > Harder Way

Use the tool:

$ ./target/release/phonemoro-cli --help
Usage: phonemoro-cli [OPTIONS] <text_or_file>

Arguments:
  <text_or_file>  Pass the path to the file that should be converted to phonemes. If the flag --text is set, this will be interpreted as raw text.

Options:
  -t, --text     If set, the passed text will be phonemized, instead of interpreted as a file path.
  -h, --help     Print help
  -V, --version  Print version

TODO

Acknowledgements

hexgrad/Misaki: Original (and reference) phonemizer for Kokoro, smarter than phonemoro.
Patchethium/Celosia: Another phonemizer in Rust, a good choice. Inspired a lot about how this project works, but does not use the Misaki datasets and its phonetic alphabet is ARPAbet, which makes it incompatible with Kokoro. ARPAbet could theoretically be transcribed to IPA, but it isn't as expressive as IPA (specifically, the stresses are missing), so doesn't work great.

Attribution

This project utilizes data from hexgrad/Misaki, licensed under the Apache License 2.0.

Only a subset of the files from that project are used. The original LICENSE file is placed next to the original data when downloaded. The data is cleaned, processed, transformed into a different format, and used for phonemization.

License

phonemoro is licensed under the MIT License.

Also, I wanted to use this as a learning opportunity for Rust :) ↩

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
data-preparation		data-preparation
phonemoro-cli		phonemoro-cli
phonemoro-common		phonemoro-common
phonemoro-macros		phonemoro-macros
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phonemoro

Overview

Usage (lib)

Easy Way (Recommended)

Harder Way

Usage (cli)

TODO

Acknowledgements

Attribution

License

About

Uh oh!

Releases 1

Uh oh!

Languages

License

lastleon/phonemoro

Folders and files

Latest commit

History

Repository files navigation

Phonemoro

Overview

Usage (lib)

Easy Way (Recommended)

Harder Way

Usage (cli)

TODO

Acknowledgements

Attribution

License

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages