PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

2D Time independent Schrodinger equation solver for arbitrary shape of well

Robust Consistent Video Depth Estimation

Six - a Python 2 and 3 compatibility library

PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Coded illumination for improved lensless imaging

HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images

Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

A minimalist implementation of score-based diffusion model

The source code of CVPR17 'Generative Face Completion'.

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

This is the repository for The Machine Learning Workshops, published by AI DOJO

Method for facial emotion recognition compitition of Xunfei and Datawhale .

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

(ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning"

NIMA: Neural IMage Assessment

Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

Official Tensorflow implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"