Skip to content

Releases: mesolitica/malaya-speech

Version 1.4.0rc1

25 Mar 16:28
Compare
Choose a tag to compare
  1. Starting Malaya-Boilerplate 0.0.24, if Tensorflow absent in local, it will be replaced with Mock Tensorflow, https://malaya-speech.readthedocs.io/en/latest/mock-tensorflow.html, we are going to focus on PyTorch onwards.
  2. Added PyTorch RNNT using TorchAudio, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
  3. Added PyTorch Multi-language RNNT using TorchAudio, now you can predict multi-language in 1 audio sample, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt-multilanguage.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
  4. Added more ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/stt-ctc-huggingface.html
  5. Added Finetuned Whisper models, trained on Malaya-Speech Malay train set and IMDA Singlish train set, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
  6. Added HuggingFace ASR Seq2Seq models, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
  7. Added Force Alignment using PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/force-alignment-transducer-pt.html
  8. Added Force Alignment using HuggingFace ASR Seq2Seq models https://malaya-speech.readthedocs.io/en/latest/force-alignment-seq2seq-huggingface.html
  9. Added orkid, bunga, jebat, tuah, male, female speakers for TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html
  10. Added multispeaker TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits-multispeaker.html
  11. Added is clean detection, very useful if you want to very clean voice activities, https://malaya-speech.readthedocs.io/en/latest/load-is-clean.html
  12. Added Speaker embedding models from Nemo, without required to install Nemo, https://malaya-speech.readthedocs.io/en/latest/load-speaker-vector-nemo.html, there are the best in term of EER score on VoxCeleb2 test set.
  13. Added interface to combine multiple diarization results become single diarization result, https://malaya-speech.readthedocs.io/en/latest/combine-longer-speaker-diarization.html
  14. Added TorchAudio streaming interface, streaming VAD, https://malaya-speech.readthedocs.io/en/latest/long-audio-vad-torchaudio.html
  15. Added TorchAudio streaming interface, streaming ASR, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
  16. Added Enformer Streaming PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
  17. Added TorchAudio streaming interface, streaming ASR and diarization on Youtube videos, https://malaya-speech.readthedocs.io/en/latest/youtube-asr-diarization-torchaudio.html

To install it,

pip3 install malaya-speech==1.4.0rc1

Version 1.3.0

18 Sep 06:44
Compare
Choose a tag to compare

Version 1.2.7

13 Jun 07:34
Compare
Choose a tag to compare
  1. Added Speech-to-Text HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
  2. Added Force Alignment HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
  3. Added Text-to-Speech LightSpeech, https://arxiv.org/abs/2102.04040, https://malaya-speech.readthedocs.io/en/latest/tts-lightspeech-model.html
  4. Now Transducer LM support multi-languages.

Version 1.2.6

06 May 16:54
Compare
Choose a tag to compare
  1. Use HuggingFace as backend repository.
  2. Added yasmin and osman speakers for TTS Tacotron2, https://malaya-speech.readthedocs.io/en/latest/tts-tacotron2-model.html
  3. Added yasmin and osman speakers for TTS FastSpeech2, https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html
  4. Added yasmin and osman speakers for TTS GlowTTS, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
  5. Use yasmin and osman speakers for long text TTS, https://malaya-speech.readthedocs.io/en/latest/tts-long-text.html

Version 1.2.5

20 Mar 10:16
Compare
Choose a tag to compare
  1. Use latest SpectralCluster==0.2.4 for diarization.
  2. Added Gradio interface for STT and TTS.

Version 1.2.4

01 Mar 04:56
Compare
Choose a tag to compare

Version 1.2.2

29 Dec 04:58
Compare
Choose a tag to compare

Version 1.2.1

02 Dec 12:51
Compare
Choose a tag to compare
  1. Added more KenLM models, included Malay + Singlish, https://malaya-speech.readthedocs.io/en/latest/ctc-language-model.html
  2. Improved ASR CTC models, Hubert-Conformer-Large achieved 12.8% WER-LM, 3.8% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html
  3. Added CTC Decoders interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-ctc-decoders.html
  4. Added pyctcdecode interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode.html
  5. Improved ASR RNNT models, large-conformer achieved 14.8% WER-LM, 5.9% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model.html
  6. Added KenLM support for ASR RNNT models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html
  7. Added ASR RNNT for 2 mixed languages, Malay and Singlish, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html#
  8. Added ASR RNNT for 3 mixed languages, Malay, Singlish and Mandarin, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-3mixed.html
  9. Added GlowTTS Text-to-Speech, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
  10. Added GlowTTS Text-to-Speech Multispeakers, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-multispeaker-model.html
  11. Added HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-vocoder.html
  12. Added Universal HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-universal-hifigan.html

Version 1.2

02 Oct 09:27
Compare
Choose a tag to compare
  1. Added HuBERT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html, new SOTA on Malay CER.
  2. Improved Singlish TTS model, now supported Universal MelGAN as vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-singlish.html
  3. Added Force Alignment module, now you can generate a time-aligned for your transcription, https://malaya-speech.readthedocs.io/en/latest/force-alignment.html
  4. Improved Mixed STT Transducer models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html
  5. Add new Mixed STT SOTA models, called conformer-stack-mixed, 2% better than other Mixed STT models, no paper produced, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html#List-available-RNNT-model
  6. Add Singlish STT Transducer models, thanks to Singapore National Speech Corpus for the dataset, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-singlish.html

Version 1.1.1

29 Jun 10:45
Compare
Choose a tag to compare
  1. Improved Bahasa Speech-to-Text, Large Conformer beat Google Speech-to-Text accuracy.
  2. Improved Mixed (malay and singlish) Speech-to-Text.
  3. Added real time Mixed (malay and singlish) Speech-to-Text documentation, https://malaya-speech.readthedocs.io/en/latest/realtime-asr-mixed.html