Neural HMMs are all you need (for high-quality attention-free TTS)

Overview

Neural HMMs are all you need (for high-quality attention-free TTS)

Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter

This is the official code repository for the paper "Neural HMMs are all you need (for high-quality attention-free TTS)". For audio examples, visit our demo page. A pre-trained model is also available.

Setup and training using LJ Speech

  1. Download and extract the LJ Speech dataset. Place it in the data folder such that the directory becomes data/LJSpeech-1.1. Otherwise update the filelists in data/filelists accordingly.
  2. Clone this repository git clone https://github.com/shivammehta007/Neural-HMM.git
    • If using single GPU checkout the branch gradient_checkpointing it will help to fit bigger batch size during training.
  3. Initalise the submodules git submodule init; git submodule update
  4. Make sure you have docker installed and running.
    • It is recommended to use Docker (it manages the CUDA runtime libraries and Python dependencies itself specified in Dockerfile)
    • Alternatively, If you do not intend to use Docker, you can use pip to install the dependencies using pip install -r requirements.txt
  5. Run bash start.sh and it will install all the dependencies and run the container.
  6. Check src/hparams.py for hyperparameters and set GPUs.
    1. For multi-GPU training, set GPUs to [0, 1 ..]
    2. For CPU training (not recommended), set GPUs to an empty list []
    3. Check the location of transcriptions
  7. Run python train.py to train the model.
    1. Checkpoints will be saved in the hparams.checkpoint_dir.
    2. Tensorboard logs will be saved in the hparams.tensorboard_log_dir.
  8. To resume training, run python train.py -c <CHECKPOINT_PATH>

Synthesis

  1. Download our pre-trained LJ Speech model. (This is the exact same model as system NH2 in the paper, but with training continued until reaching 200k updates total.)
  2. Download Nvidia's WaveGlow model.
  3. Run jupyter notebook and open synthesis.ipynb.

Miscellaneous

Mixed-precision training or full-precision training

  • In src.hparams.py change hparams.precision to 16 for mixed precision and 32 for full precision.

Multi-GPU training or single-GPU training

  • Since the code uses PyTorch Lightning, providing more than one element in the list of GPUs will enable multi-GPU training. So change hparams.gpus to [0, 1, 2] for multi-GPU training and single element [0] for single-GPU training.

Known issues/warnings

PyTorch dataloader

  • If you encounter this error message [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool), this is a known issue in PyTorch Dataloader.
  • It will be fixed when PyTorch releases a new Docker container image with updated version of Torch. If you are not using docker this can be removed with torch > 1.9.1

Support

If you have any questions or comments, please open an issue on our GitHub repository.

Citation information

If you use or build on our method or code for your research, please cite our paper:

@article{mehta2021neural,
  title={Neural {HMM}s are all you need (for high-quality attention-free {TTS})},
  author={Mehta, Shivam and Sz{\'e}kely, {\'E}va and Beskow, Jonas and Henter, Gustav Eje},
  journal={arXiv preprint arXiv:2108.13320},
  year={2021}
}

Acknowledgements

The code implementation is based on Nvidia's implementation of Tacotron 2 and uses PyTorch Lightning for boilerplate-free code.

You might also like...
🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Microsoft Edge TTS for Home Assistant This component is based on the TTS service of Microsoft Edge browser, no need to apply for app_key. Install Down

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Polarized Self-Attention: Towards High-quality Pixel-wise Regression This is an official implementation of: Huajun Liu, Fuqiang Liu, Xinyi Fan and Don

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation E2EC: An End-to-End Contour-based Method for High-Quality H

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Code for
Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

PixelPick This is an official implementation of the paper
PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

Per-Pixel Classification is Not All You Need for Semantic Segmentation
Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

 Open-Set Recognition: A Good Closed-Set Classifier is All You Need
Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

Releases(Neural-HMM)
Owner
Shivam Mehta
PhD Student at KTH Royal Institute of Technology
Shivam Mehta
Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Python scripts to detect faces using Python with the BlazeFace Tensorflow Lite models. Tested on Windows 10, Tensorflow 2.4.0 (Python 3.8).

Ibai Gorordo 46 Nov 17, 2022
Collection of in-progress libraries for entity neural networks.

ENN Incubator Collection of in-progress libraries for entity neural networks: Neural Network Architectures for Structured State Entity Gym: Abstractio

25 Dec 01, 2022
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Matthias Wright 169 Dec 26, 2022
PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

Learning Character-Agnostic Motion for Motion Retargeting in 2D We provide PyTorch implementation for our paper Learning Character-Agnostic Motion for

Rundi Wu 367 Dec 22, 2022
Groceries ARL: Association Rules (Birliktelik Kuralı)

Groceries_ARL Association Rules (Birliktelik Kuralı) Birliktelik kuralları, mark

Şebnem 5 Feb 08, 2022
Machine Learning automation and tracking

The Open-Source MLOps Orchestration Framework MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-lea

873 Jan 04, 2023
Simulating an AI playing 2048 using the Expectimax algorithm

2048-expectimax Simulating an AI playing 2048 using the Expectimax algorithm The base game engine uses code from here. The AI player is modeled as a m

Subha Ramesh 2 Jan 31, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 09, 2022
A flag generation AI created using DeepAIs API

Vex AI or Vexiology AI is an Artifical Intelligence created to generate custom made flag design texts. It uses DeepAIs API. Please be aware that you must include your own DeepAI API key. See instruct

Bernie 10 Apr 06, 2022
Self-Supervised Learning for Domain Adaptation on Point-Clouds

Self-Supervised Learning for Domain Adaptation on Point-Clouds Introduction Self-supervised learning (SSL) allows to learn useful representations from

Idan Achituve 66 Dec 20, 2022
Ἀνατομή is a PyTorch library to analyze representation of neural networks

Ἀνατομή is a PyTorch library to analyze representation of neural networks

Ryuichiro Hataya 50 Dec 05, 2022
This is a Image aid classification software based on python TK library development

This is a Image aid classification software based on python TK library development.

EasonChan 1 Jan 17, 2022
Finetune alexnet with tensorflow - Code for finetuning AlexNet in TensorFlow >= 1.2rc0

Finetune AlexNet with Tensorflow Update 15.06.2016 I revised the entire code base to work with the new input pipeline coming with TensorFlow = versio

Frederik Kratzert 766 Jan 04, 2023
A tool for making map images from OpenTTD save games

OpenTTD Surveyor A tool for making map images from OpenTTD save games. This is not part of the main OpenTTD codebase, nor is it ever intended to be pa

Aidan Randle-Conde 9 Feb 15, 2022
Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models

Hyperparameter Optimization of Machine Learning Algorithms This code provides a hyper-parameter optimization implementation for machine learning algor

Li Yang 1.1k Dec 19, 2022
DumpSMBShare - A script to dump files and folders remotely from a Windows SMB share

DumpSMBShare A script to dump files and folders remotely from a Windows SMB shar

Podalirius 178 Jan 06, 2023
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

77 Dec 16, 2022
Image-to-image translation with conditional adversarial nets

pix2pix Project | Arxiv | PyTorch Torch implementation for learning a mapping from input images to output images, for example: Image-to-Image Translat

Phillip Isola 9.3k Jan 08, 2023
A program that can analyze videos according to the weights you select

MaskMonitor A program that can analyze videos according to the weights you select 下載 訓練完的 weight檔案 執行 MaskDetection.py 內部可更改 輸入來源(鏡頭, 影片, 圖片) 以及輸出條件(人

Patrick_star 1 Nov 07, 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv Abstract Convolutional Neural Networks (CNNs) have become the de f

Philipp Benz 12 Oct 24, 2022