Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Last update: Jan 05, 2023

Related tags

Text Data & NLP malaya-speech

Overview

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya-speech

GPU version

$ pip install malaya-speech[gpu]

Only Python 3.6.0 and above and Tensorflow 1.15.0 and above are supported.

We recommend to use virtualenv for development. All examples tested on Tensorflow version 1.15.4, 1.15.5, 2.4.1 and 2.5.

Features

Age Detection, detect age in speech using Finetuned Speaker Vector.
Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.
Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.
Force Alignment, generate a time-aligned transcription of an audio file using RNNT.
Gender Detection, detect genders in speech using Finetuned Speaker Vector.
Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.
Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.
Noise Reduction, reduce multilevel noises using STFT UNET.
Speaker Change, detect changing speakers using Finetuned Speaker Vector.
Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.
Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.
Speech Enhancement, enhance voice activities using Waveform UNET.
SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.
Speech-to-Text, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish and Mandarin) and Singlish using RNNT and Wav2Vec2 CTC.
Super Resolution, Super Resolution 4x for Waveform.
Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2 and FastPitch.
Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.
Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.
Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.
Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

Wave UNET, Multi-Scale Neural Network for End-to-End Audio Source Separation, https://arxiv.org/abs/1806.03185
Wave ResNet UNET, added ResNet style into Wave UNET, no paper produced.
Wave ResNext UNET, added ResNext style into Wave UNET, no paper produced.
Deep Speaker, An End-to-End Neural Speaker Embedding System, https://arxiv.org/pdf/1705.02304.pdf
SpeakerNet, 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification, https://arxiv.org/abs/2010.12653
VGGVox, a large-scale speaker identification dataset, https://arxiv.org/pdf/1706.08612.pdf
GhostVLAD, Utterance-level Aggregation For Speaker Recognition In The Wild, https://arxiv.org/abs/1902.10107
Conformer, Convolution-augmented Transformer for Speech Recognition, https://arxiv.org/abs/2005.08100
ALConformer, A lite Conformer, no paper produced.
Jasper, An End-to-End Convolutional Neural Acoustic Model, https://arxiv.org/abs/1904.03288
Tacotron2, Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions, https://arxiv.org/abs/1712.05884
FastSpeech2, Fast and High-Quality End-to-End Text to Speech, https://arxiv.org/abs/2006.04558
MelGAN, Generative Adversarial Networks for Conditional Waveform Synthesis, https://arxiv.org/abs/1910.06711
Multi-band MelGAN, Faster Waveform Generation for High-Quality Text-to-Speech, https://arxiv.org/abs/2005.05106
SRGAN, Modified version of SRGAN to do 1D Convolution, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, https://arxiv.org/abs/1609.04802
Speech Enhancement UNET, https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement
Speech Enhancement ResNet UNET, Added ResNet style into Speech Enhancement UNET, no paper produced.
Speech Enhancement ResNext UNET, Added ResNext style into Speech Enhancement UNET, no paper produced.
Universal MelGAN, Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains, https://arxiv.org/abs/2011.09631
FastVC, Faster and Accurate Voice Conversion using Transformer, no paper produced.
FastSep, Faster and Accurate Speech Separation using Transformer, no paper produced.
wav2vec 2.0, A Framework for Self-Supervised Learning of Speech Representations, https://arxiv.org/abs/2006.11477
FastSpeechSplit, Unsupervised Speech Decomposition Via Triple Information Bottleneck using Transformer, no paper produced.
Sepformer, Attention is All You Need in Speech Separation, https://arxiv.org/abs/2010.13154
FastSpeechSplit, Faster and Accurate Speech Split Conversion using Transformer, no paper produced.
HuBERT, Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, https://arxiv.org/pdf/2106.07447v1.pdf
FastPitch, Parallel Text-to-speech with Pitch Prediction, https://arxiv.org/abs/2006.06873
GlowTTS, A Generative Flow for Text-to-Speech via Monotonic Alignment Search, https://arxiv.org/abs/2005.11129

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for sponsoring private cloud to train Malaya-Speech models, without it, this library will collapse entirely.

ExKaldi-RT: An Online Speech Recognition Extension Toolkit of Kaldi

ExKaldi-RT is an online ASR toolkit for Python language. It reads realtime streaming audio and do online feature extraction, probability computation, and online decoding.

31 Aug 16, 2021

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart

247 Jan 5, 2023

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。安装 pip install -U ttskit 注意可能需另外安装的依赖包：torch，版本要求torch=1.6.0,=1.7.1，根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

483 Jan 4, 2023

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

247 Dec 26, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

10 Oct 13, 2022

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized

193 Jan 4, 2023

Releases(1.3.0)

1.3.0(Sep 18, 2022)
Added GPT2 LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/gpt2-lm.html

Added Mask LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/masked-lm.html

Added Transducer with GPT2 LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html

Added Transducer with Mask LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html

Added GPT2 LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-gpt2.html

Added Mask LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-mlm.html

Added Squeezeformer transducer models.

Added End-to-End FastSpeech2 STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-e2e-fastspeech2.html

Added End-to-End VITS STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html

Added Neural Vocoder Super Resolution models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-tfgan.html

Added super resolution diffusion models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-audio-diffusion.html

Added HMM speaker diarization, https://malaya-speech.readthedocs.io/en/latest/load-diarization-clustering-hmm.html

Source code(tar.gz)
Source code(zip)
1.2.7(Jun 13, 2022)
Added Speech-to-Text HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html

Added Force Alignment HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html

Added Text-to-Speech LightSpeech, https://arxiv.org/abs/2102.04040, https://malaya-speech.readthedocs.io/en/latest/tts-lightspeech-model.html

Now Transducer LM support multi-languages.

Source code(tar.gz)
Source code(zip)
1.2.6(May 6, 2022)
Use HuggingFace as backend repository.

Added yasmin and osman speakers for TTS Tacotron2, https://malaya-speech.readthedocs.io/en/latest/tts-tacotron2-model.html

Added yasmin and osman speakers for TTS FastSpeech2, https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html

Added yasmin and osman speakers for TTS GlowTTS, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html

Use yasmin and osman speakers for long text TTS, https://malaya-speech.readthedocs.io/en/latest/tts-long-text.html

Source code(tar.gz)
Source code(zip)
1.2.5(Mar 20, 2022)
Use latest SpectralCluster==0.2.4 for diarization.

Added Gradio interface for STT and TTS.

Source code(tar.gz)
Source code(zip)
1.2.4(Mar 1, 2022)
Added malay language pretrained BEST-RQ models, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/stt/best_rq

Added BEST-RQ STT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html#List-available-CTC-model

Source code(tar.gz)
Source code(zip)
1.2.2(Dec 29, 2021)
Added 3 mixed languages for CTC Hubert model, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-3mixed.html

Source code(tar.gz)
Source code(zip)
1.2.1(Dec 2, 2021)
Added more KenLM models, included Malay + Singlish, https://malaya-speech.readthedocs.io/en/latest/ctc-language-model.html

Improved ASR CTC models, Hubert-Conformer-Large achieved 12.8% WER-LM, 3.8% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html

Added CTC Decoders interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-ctc-decoders.html

Added pyctcdecode interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode.html

Improved ASR RNNT models, large-conformer achieved 14.8% WER-LM, 5.9% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model.html

Added KenLM support for ASR RNNT models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html

Added ASR RNNT for 2 mixed languages, Malay and Singlish, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html#

Added ASR RNNT for 3 mixed languages, Malay, Singlish and Mandarin, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-3mixed.html

Added GlowTTS Text-to-Speech, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html

Added GlowTTS Text-to-Speech Multispeakers, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-multispeaker-model.html

Added HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-vocoder.html

Added Universal HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-universal-hifigan.html

Source code(tar.gz)
Source code(zip)
1.2(Oct 2, 2021)
Added HuBERT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html, new SOTA on Malay CER.

Improved Singlish TTS model, now supported Universal MelGAN as vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-singlish.html

Added Force Alignment module, now you can generate a time-aligned for your transcription, https://malaya-speech.readthedocs.io/en/latest/force-alignment.html

Improved Mixed STT Transducer models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html

Add new Mixed STT SOTA models, called conformer-stack-mixed, 2% better than other Mixed STT models, no paper produced, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html#List-available-RNNT-model

Add Singlish STT Transducer models, thanks to Singapore National Speech Corpus for the dataset, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-singlish.html

Source code(tar.gz)
Source code(zip)
1.1.1(Jun 29, 2021)
Improved Bahasa Speech-to-Text, Large Conformer beat Google Speech-to-Text accuracy.

Improved Mixed (malay and singlish) Speech-to-Text.

Added real time Mixed (malay and singlish) Speech-to-Text documentation, https://malaya-speech.readthedocs.io/en/latest/realtime-asr-mixed.html

Source code(tar.gz)
Source code(zip)
1.1(Jun 1, 2021)
Added SpeechSplit Conversion.

Added Force Alignment using Transducer model.

Added Optimization docs.

Source code(tar.gz)
Source code(zip)
1.0(Apr 18, 2021)

Released V1.0!
Source code(tar.gz)
Source code(zip)

Owner

HUSEIN ZOLKEPLI

I really love to fart and korek hidung.

GitHub Repository https://malaya-speech.readthedocs.io/

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

48 Dec 14, 2022

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

파이썬 비트코인 투자 자동화 강의 코드 by 유튜브 조코딩 채널 pyupbit 라이브러리를 활용하여 upbit 거래소에서 비트코인 자동매매를 하는 코드입니다. 파일 구성 test.py : 잔고 조회 (1강) backtest.py : 백테스팅 코드 (2강) bestK.p

186 Dec 29, 2022

Precision Medicine Knowledge Graph (PrimeKG)

PrimeKG Website | bioRxiv Paper | Harvard Dataverse Precision Medicine Knowledge Graph (PrimeKG) presents a holistic view of diseases. PrimeKG integra

103 Dec 10, 2022

Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

49 Dec 30, 2022

Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

9 Nov 17, 2022

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Emily's Symbol Dictionary Design This dictionary was created with the following goals in mind: Have a consistent method to type (pretty much) every sy

68 Jan 07, 2023

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Related tags

Overview

Documentation

Installing from the PyPI

Features

Pretrained Models

References

Acknowledgement

You might also like...

ExKaldi-RT: An Online Speech Recognition Extension Toolkit of Kaldi

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Releases(1.3.0)

1.3.0(Sep 18, 2022)

1.2.7(Jun 13, 2022)

1.2.6(May 6, 2022)

1.2.5(Mar 20, 2022)

1.2.4(Mar 1, 2022)

1.2.2(Dec 29, 2021)

1.2.1(Dec 2, 2021)

1.2(Oct 2, 2021)

1.1.1(Jun 29, 2021)

1.1(Jun 1, 2021)

1.0(Apr 18, 2021)

Owner

HUSEIN ZOLKEPLI

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

Precision Medicine Knowledge Graph (PrimeKG)

Biterm Topic Model (BTM): modeling topics in short texts

Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

A Python script that compares files in directories

wxPython app for converting encodings, modifying and fixing SRT files

A Chinese to English Neural Model Translation Project

Basic Utilities for PyTorch Natural Language Processing (NLP)

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Python utility library for compositing PDF documents with reportlab.

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Topic Inference with Zeroshot models

Blue Brain text mining toolbox for semantic search and structured information extraction

This repository structures data in title, summary, tags, sentiment given a fragment of a conversation

A BERT-based reverse-dictionary of Korean proverbs

SDL: Synthetic Document Layout dataset

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.