Refactored version of FastSpeech2

Last update: May 26, 2022

Overview

FastSpeech2

This repository is a refactored version from ming024's own. I focused on refactoring structure for fitting my cases and making parallel pre-processing codes. And I wrote installation guide with the latest version of MFA(Montreal Force Aligner).

Installation

Tested on python 3.8, Ubuntu 20.04
- Notice ! For installing MFA, you should install the miniconda.
- If you run MFA under 16.04 or ealier version of Ubuntu, you will face a compile error.
In your system
- To install pyworld, run "sudo apt-get install python3.x-dev". (x is your python version).
- To install sndfile, run "sudo apt-get install libsndfile-dev"
- To use MFA, run "sudo apt-get install libopenblas-base"
Install requirements

# install pytorch_sound
pip install git+https://github.com/appleholic/pytorch_sound
pip install -e .

Download datasets

VCTK
- Visit and download dataset from https://datashare.is.ed.ac.uk/handle/10283/2651
- Move to "./data" and extract compressed file.
  - If you wanna save dataset to another directory, you must change the path of configuration files.
LibriTTS
- To be updated

Install MFA
- Visit and follow a guide that described in MFA installation website.
- Additional installation
  - mfa thirdparty download
  - mfa download acoustic english
Pre-trained checkpoint
- VCTK, 400k steps : Google Drive Link

Preprocess (VCTK case)

Prepare MFA

python fastspeech2/scripts/prepare_align.py configs/vctk_prepare_align.json

Run MFA for making alignments

# Define your the number of threads to run MFA at the last of a command. "-j [The number of threads]"
mfa align data/fastspeech2/vctk lexicons/librispeech-lexicon.txt english data/fastspeech2/vctk-pre -j 24

Feature preprocessing

python fastspeech2/scripts/preprocess.py configs/vctk_preprocess.json

Train

Multi-speaker fastspeech2

python fastspeech2/scripts/train.py configs/fastspeech2_vctk_tts.json

If you want to change the parameters of training FastSpeech2, check out the code and put the option to configuration file.
- train code : fastspeech2/scripts/train.py
- config : configs/fastspeech2_vctk_tts.json

Fastspeech2 with reference encoder (To be updated)

Synthesize

Multi-spaker model

In a code

from fastspeech2.inference import Inferencer
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# arguments
# chk_path: str, lexicon_path: str, device: str = 'cuda'
inferencer = Inferencer(chk_path=chk_path, lexicon_path=lexicon_path, device=device)

# initialize hifigan
interface = InterfaceHifiGAN(model_name='hifi_gan_v1_universal', device='cuda')

# arguments
# text: str, speaker: int = 0, pitch_control: float = 1., energy_control: float = 1., duration_control: float = 1.
txt = 'Hello, I am a programmer.'
mel_spectrogram = inferencer.tts(txt, speaker=0)

# Reconstructs speech by using Hifi-GAN
pred_wav = interface.decode(mel_spectrogram.transpose(1, 2)).squeeze()

# If you test on a jupyter notebook
from IPython.display import Audio
Audio(pred_wav.cpu().numpy(), rate=22050)

In command line

python fastspeech2/scripts/synthesize.py [TEXT] [OUTPUT PATH] [CHECKPOINT PATH] [LEXICON PATH] [[DEVICE]] [[SPEAKER]]

Reference encoder (not updated)

Reference

ming024/FastSpeech2

Refactored version of FastSpeech2

Related tags

Overview

FastSpeech2

Installation

Preprocess (VCTK case)

Train

Synthesize

Multi-spaker model

Reference encoder (not updated)

Reference

Owner

ILJI CHOI

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

Tools and data for measuring the popularity & growth of various programming languages.

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Problem: Given a nepali news find the category of the news

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Predict an emoji that is associated with a text

Weaviate demo with the text2vec-openai module

NSFW A chatbot based on GPT2-chitchat

BookNLP, a natural language processing pipeline for books

Natural language Understanding Toolkit

This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

A high-level Python library for Quantum Natural Language Processing

Yes it's true :broken_heart:

Summarization module based on KoBART

A method for cleaning and classifying text using transformers.

Refactored version of FastSpeech2

Related tags

Overview

FastSpeech2

Installation

Preprocess (VCTK case)

Train

Synthesize

Multi-spaker model

Reference encoder (not updated)

Reference

Owner

ILJI CHOI

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

Tools and data for measuring the popularity & growth of various programming languages.

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Problem: Given a nepali news find the category of the news

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Predict an emoji that is associated with a text

Weaviate demo with the text2vec-openai module

**NSFW** A chatbot based on GPT2-chitchat

BookNLP, a natural language processing pipeline for books

Natural language Understanding Toolkit

This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

A high-level Python library for Quantum Natural Language Processing

Yes it's true :broken_heart:

Summarization module based on KoBART

A method for cleaning and classifying text using transformers.

NSFW A chatbot based on GPT2-chitchat