The Echogarden speech toolset supports all of Piper's voices, with many additional features and enhancements #674
rotemdan
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm the developer of Echogarden, which is a cross-platform speech toolset that runs on the Node.js runtime (GPL-3 Licensed).
It works as a command-line application, or as a Node.js library (It's very easy to install, thanks to
npm
, and requires no compilation).Echogarden has supported and updated with all of Piper's ONNX models (currently a total of 123) since it was first published on late April 2023. It doesn't actually rely on Piper itself - it's an independent implementation that uses a custom WebAssembly port of
eSpeak-ng
and the Node.js binding for the ONNX runtime (onnxruntime-node
) to load the raw.onnx
models directly.It adds many features and enhancements, some of which are not directly available in Piper:
read
,present
andlive
(US English only at the moment, about 30 - 50 words included). The model is rule-based, and uses surrounding words to decide on the most likely pronunciation of the target wordsonic
andrubberband
(WASM ports), in addition to the native pitch and time shift parameters of the models themselves(Interesting fact: almost all these features were available since about August 2023, so they can't really be called "new")
It also supports 14 other synthesis engines (offline and online) including, for example Google Translate, Microsoft Edge, and Elevenlabs, and many methods for speech recognition, forced alignment, voice isolation, speech translation, etc. It also includes a custom, ONNX-based implementation of Whisper that took a very large amount of effort to develop.
With all that, the project currently has very low usage, with only 207 stars so far, after a year and 8 months, given it has a 25,000 line codebase with 23 auxiliary repositories (it had about 70-80 stars after a year of the initial release).
It's likely because I have never personally publicized or announced it to anyone, not on forums, or other repositories, and apparently so never did the users (for all I know).
Based on its issue tracker, there's a small group of dedicated users that seem to use it for alignment or speech recognition, but there's almost no mention for it being used for speech synthesis - which was actually one of the first areas that were originally implemented.
The low usage and visibility has It came to the point where I have started to become concerned that the software would become outdated (effectively "vintage") before a meaningful amount of people would use it at all. The models would be superseded by newer models and some of the development efforts wouldn't make the contribution to the extent they were intended for.
I noticed it has never been mentioned on Piper's discussions or issues, which I find very odd. I refrained from doing it myself due to my personal general aversion with all kinds of "marketing" or "promotion"-like activities. I thought the quality of the software could speak for itself, and users would spread the word organically, but apparently, that didn't seem to happen.
So, if your usage of Piper is mostly about running command lines, and you don't really need it as a Python library or direct C++ dependency, you may want to consider trying the
vits
engine in Echogarden, which includes all of its voices, and is significantly easier to install and run. If you're looking to use custom models, they can be supported, but there hasn't been any visible demand for that so far.Beta Was this translation helpful? Give feedback.
All reactions