The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Last update: Oct 28, 2022

Related tags

Overview

VAENAR-TTS

This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

Samples | Paper | Pretrained Models

Usage

0. Dataset

English: LJSpeech
Mandarin: DataBaker(标贝)

1. Environment setup

conda env create -f environment.yml
conda activate vaenartts-env

2. Data pre-processing

For English using LJSpeech:

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker

3. Training

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir

4. Inference (synthesize speech for the whole test set)

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Related tags

Overview

VAENAR-TTS

Samples | Paper | Pretrained Models

Usage

0. Dataset

1. Environment setup

2. Data pre-processing

3. Training

4. Inference (synthesize speech for the whole test set)

Reference

Owner

THUHCSI

Rethinking the Truly Unsupervised Image-to-Image Translation - Official PyTorch Implementation (ICCV 2021)

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Large-scale pretraining for dialogue

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

This repository contains the code, models and datasets discussed in our paper "Few-Shot Question Answering by Pretraining Span Selection"

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

The swas programming language

Various capabilities for static malware analysis.

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

AutoGluon: AutoML for Text, Image, and Tabular Data

InferSent sentence embeddings

BiQE: Code and dataset for the BiQE paper

LeBenchmark: a reproducible framework for assessing SSL from speech

Search for documents in a domain through Google. The objective is to extract metadata

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

p-tuning for few-shot NLU task

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants