Releases: mesolitica/malaya-speech
Releases · mesolitica/malaya-speech
Version 1.4.0rc1
- Starting Malaya-Boilerplate 0.0.24, if Tensorflow absent in local, it will be replaced with Mock Tensorflow, https://malaya-speech.readthedocs.io/en/latest/mock-tensorflow.html, we are going to focus on PyTorch onwards.
- Added PyTorch RNNT using TorchAudio, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
- Added PyTorch Multi-language RNNT using TorchAudio, now you can predict multi-language in 1 audio sample, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt-multilanguage.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
- Added more ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/stt-ctc-huggingface.html
- Added Finetuned Whisper models, trained on Malaya-Speech Malay train set and IMDA Singlish train set, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
- Added HuggingFace ASR Seq2Seq models, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
- Added Force Alignment using PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/force-alignment-transducer-pt.html
- Added Force Alignment using HuggingFace ASR Seq2Seq models https://malaya-speech.readthedocs.io/en/latest/force-alignment-seq2seq-huggingface.html
- Added
orkid
,bunga
,jebat
,tuah
,male
,female
speakers for TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html - Added multispeaker TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits-multispeaker.html
- Added is clean detection, very useful if you want to very clean voice activities, https://malaya-speech.readthedocs.io/en/latest/load-is-clean.html
- Added Speaker embedding models from Nemo, without required to install Nemo, https://malaya-speech.readthedocs.io/en/latest/load-speaker-vector-nemo.html, there are the best in term of EER score on VoxCeleb2 test set.
- Added interface to combine multiple diarization results become single diarization result, https://malaya-speech.readthedocs.io/en/latest/combine-longer-speaker-diarization.html
- Added TorchAudio streaming interface, streaming VAD, https://malaya-speech.readthedocs.io/en/latest/long-audio-vad-torchaudio.html
- Added TorchAudio streaming interface, streaming ASR, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
- Added Enformer Streaming PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
- Added TorchAudio streaming interface, streaming ASR and diarization on Youtube videos, https://malaya-speech.readthedocs.io/en/latest/youtube-asr-diarization-torchaudio.html
To install it,
pip3 install malaya-speech==1.4.0rc1
Version 1.3.0
- Added GPT2 LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/gpt2-lm.html
- Added Mask LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/masked-lm.html
- Added Transducer with GPT2 LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
- Added Transducer with Mask LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
- Added GPT2 LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-gpt2.html
- Added Mask LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-mlm.html
- Added Squeezeformer transducer models.
- Added End-to-End FastSpeech2 STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-e2e-fastspeech2.html
- Added End-to-End VITS STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html
- Added Neural Vocoder Super Resolution models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-tfgan.html
- Added super resolution diffusion models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-audio-diffusion.html
- Added HMM speaker diarization, https://malaya-speech.readthedocs.io/en/latest/load-diarization-clustering-hmm.html
Version 1.2.7
- Added Speech-to-Text HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
- Added Force Alignment HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
- Added Text-to-Speech LightSpeech, https://arxiv.org/abs/2102.04040, https://malaya-speech.readthedocs.io/en/latest/tts-lightspeech-model.html
- Now Transducer LM support multi-languages.
Version 1.2.6
- Use HuggingFace as backend repository.
- Added
yasmin
andosman
speakers for TTS Tacotron2, https://malaya-speech.readthedocs.io/en/latest/tts-tacotron2-model.html - Added
yasmin
andosman
speakers for TTS FastSpeech2, https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html - Added
yasmin
andosman
speakers for TTS GlowTTS, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html - Use
yasmin
andosman
speakers for long text TTS, https://malaya-speech.readthedocs.io/en/latest/tts-long-text.html
Version 1.2.5
- Use latest SpectralCluster==0.2.4 for diarization.
- Added Gradio interface for STT and TTS.
Version 1.2.4
- Added malay language pretrained BEST-RQ models, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/stt/best_rq
- Added BEST-RQ STT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html#List-available-CTC-model
Version 1.2.2
- Added 3 mixed languages for CTC Hubert model, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-3mixed.html
Version 1.2.1
- Added more KenLM models, included Malay + Singlish, https://malaya-speech.readthedocs.io/en/latest/ctc-language-model.html
- Improved ASR CTC models, Hubert-Conformer-Large achieved 12.8% WER-LM, 3.8% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html
- Added CTC Decoders interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-ctc-decoders.html
- Added pyctcdecode interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode.html
- Improved ASR RNNT models, large-conformer achieved 14.8% WER-LM, 5.9% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model.html
- Added KenLM support for ASR RNNT models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html
- Added ASR RNNT for 2 mixed languages, Malay and Singlish, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html#
- Added ASR RNNT for 3 mixed languages, Malay, Singlish and Mandarin, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-3mixed.html
- Added GlowTTS Text-to-Speech, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
- Added GlowTTS Text-to-Speech Multispeakers, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-multispeaker-model.html
- Added HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-vocoder.html
- Added Universal HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-universal-hifigan.html
Version 1.2
- Added HuBERT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html, new SOTA on Malay CER.
- Improved Singlish TTS model, now supported Universal MelGAN as vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-singlish.html
- Added Force Alignment module, now you can generate a time-aligned for your transcription, https://malaya-speech.readthedocs.io/en/latest/force-alignment.html
- Improved Mixed STT Transducer models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html
- Add new Mixed STT SOTA models, called conformer-stack-mixed, 2% better than other Mixed STT models, no paper produced, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html#List-available-RNNT-model
- Add Singlish STT Transducer models, thanks to Singapore National Speech Corpus for the dataset, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-singlish.html
Version 1.1.1
- Improved Bahasa Speech-to-Text, Large Conformer beat Google Speech-to-Text accuracy.
- Improved Mixed (malay and singlish) Speech-to-Text.
- Added real time Mixed (malay and singlish) Speech-to-Text documentation, https://malaya-speech.readthedocs.io/en/latest/realtime-asr-mixed.html