Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

mesolitica / malaya-speech Public

Notifications You must be signed in to change notification settings
Fork 42
Star 241

Code
Issues 6
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: mesolitica/malaya-speech

Releases · mesolitica/malaya-speech

Version 1.4.0rc1

25 Mar 16:28

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.4.0rc1 Latest

Latest

Starting Malaya-Boilerplate 0.0.24, if Tensorflow absent in local, it will be replaced with Mock Tensorflow, https://malaya-speech.readthedocs.io/en/latest/mock-tensorflow.html, we are going to focus on PyTorch onwards.
Added PyTorch RNNT using TorchAudio, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
Added PyTorch Multi-language RNNT using TorchAudio, now you can predict multi-language in 1 audio sample, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt-multilanguage.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
Added more ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/stt-ctc-huggingface.html
Added Finetuned Whisper models, trained on Malaya-Speech Malay train set and IMDA Singlish train set, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
Added HuggingFace ASR Seq2Seq models, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
Added Force Alignment using PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/force-alignment-transducer-pt.html
Added Force Alignment using HuggingFace ASR Seq2Seq models https://malaya-speech.readthedocs.io/en/latest/force-alignment-seq2seq-huggingface.html
Added orkid, bunga, jebat, tuah, male, female speakers for TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html
Added multispeaker TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits-multispeaker.html
Added is clean detection, very useful if you want to very clean voice activities, https://malaya-speech.readthedocs.io/en/latest/load-is-clean.html
Added Speaker embedding models from Nemo, without required to install Nemo, https://malaya-speech.readthedocs.io/en/latest/load-speaker-vector-nemo.html, there are the best in term of EER score on VoxCeleb2 test set.
Added interface to combine multiple diarization results become single diarization result, https://malaya-speech.readthedocs.io/en/latest/combine-longer-speaker-diarization.html
Added TorchAudio streaming interface, streaming VAD, https://malaya-speech.readthedocs.io/en/latest/long-audio-vad-torchaudio.html
Added TorchAudio streaming interface, streaming ASR, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
Added Enformer Streaming PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
Added TorchAudio streaming interface, streaming ASR and diarization on Youtube videos, https://malaya-speech.readthedocs.io/en/latest/youtube-asr-diarization-torchaudio.html

To install it,

pip3 install malaya-speech==1.4.0rc1

Assets 2

Loading

All reactions

Version 1.3.0

18 Sep 06:44

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.3.0

Added GPT2 LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/gpt2-lm.html
Added Mask LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/masked-lm.html
Added Transducer with GPT2 LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
Added Transducer with Mask LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
Added GPT2 LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-gpt2.html
Added Mask LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-mlm.html
Added Squeezeformer transducer models.
Added End-to-End FastSpeech2 STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-e2e-fastspeech2.html
Added End-to-End VITS STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html
Added Neural Vocoder Super Resolution models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-tfgan.html
Added super resolution diffusion models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-audio-diffusion.html
Added HMM speaker diarization, https://malaya-speech.readthedocs.io/en/latest/load-diarization-clustering-hmm.html

Assets 2

Loading

All reactions

Version 1.2.7

13 Jun 07:34

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.2.7

Added Speech-to-Text HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
Added Force Alignment HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
Added Text-to-Speech LightSpeech, https://arxiv.org/abs/2102.04040, https://malaya-speech.readthedocs.io/en/latest/tts-lightspeech-model.html
Now Transducer LM support multi-languages.

Assets 2

Loading

manfye reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

Version 1.2.6

06 May 16:54

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.2.6

Use HuggingFace as backend repository.
Added yasmin and osman speakers for TTS Tacotron2, https://malaya-speech.readthedocs.io/en/latest/tts-tacotron2-model.html
Added yasmin and osman speakers for TTS FastSpeech2, https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html
Added yasmin and osman speakers for TTS GlowTTS, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
Use yasmin and osman speakers for long text TTS, https://malaya-speech.readthedocs.io/en/latest/tts-long-text.html

Assets 2

Loading

All reactions

Version 1.2.5

20 Mar 10:16

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.2.5

Use latest SpectralCluster==0.2.4 for diarization.
Added Gradio interface for STT and TTS.

Assets 2

Loading

All reactions

Version 1.2.4

01 Mar 04:56

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.2.4

Added malay language pretrained BEST-RQ models, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/stt/best_rq
Added BEST-RQ STT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html#List-available-CTC-model

Assets 2

Loading

All reactions

Version 1.2.2

29 Dec 04:58

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.2.2

Added 3 mixed languages for CTC Hubert model, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-3mixed.html

Assets 2

Loading

mohanraj-nlp and kurkurzz reacted with thumbs up emoji

All reactions

👍 2 reactions

2 people reacted

Version 1.2.1

02 Dec 12:51

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.2.1

Added more KenLM models, included Malay + Singlish, https://malaya-speech.readthedocs.io/en/latest/ctc-language-model.html
Improved ASR CTC models, Hubert-Conformer-Large achieved 12.8% WER-LM, 3.8% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html
Added CTC Decoders interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-ctc-decoders.html
Added pyctcdecode interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode.html
Improved ASR RNNT models, large-conformer achieved 14.8% WER-LM, 5.9% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model.html
Added KenLM support for ASR RNNT models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html
Added ASR RNNT for 2 mixed languages, Malay and Singlish, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html#
Added ASR RNNT for 3 mixed languages, Malay, Singlish and Mandarin, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-3mixed.html
Added GlowTTS Text-to-Speech, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
Added GlowTTS Text-to-Speech Multispeakers, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-multispeaker-model.html
Added HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-vocoder.html
Added Universal HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-universal-hifigan.html

Assets 2

Loading

All reactions

Version 1.2

02 Oct 09:27

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.2

Added HuBERT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html, new SOTA on Malay CER.
Improved Singlish TTS model, now supported Universal MelGAN as vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-singlish.html
Added Force Alignment module, now you can generate a time-aligned for your transcription, https://malaya-speech.readthedocs.io/en/latest/force-alignment.html
Improved Mixed STT Transducer models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html
Add new Mixed STT SOTA models, called conformer-stack-mixed, 2% better than other Mixed STT models, no paper produced, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html#List-available-RNNT-model
Add Singlish STT Transducer models, thanks to Singapore National Speech Corpus for the dataset, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-singlish.html

Assets 2

Loading

All reactions

Version 1.1.1

29 Jun 10:45

huseinzol05

Compare

Choose a tag to compare

Loading

Version 1.1.1

Improved Bahasa Speech-to-Text, Large Conformer beat Google Speech-to-Text accuracy.
Improved Mixed (malay and singlish) Speech-to-Text.
Added real time Mixed (malay and singlish) Speech-to-Text documentation, https://malaya-speech.readthedocs.io/en/latest/realtime-asr-mixed.html

Assets 2

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.