Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Last update: Dec 16, 2022

Related tags

Text Data & NLP FastVocoder

Overview

Fast (GAN Based Neural) Vocoder

Chinese README

Todo

Submit demo
Support NHV

Discription

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe include NHV in the future. Developed on BiaoBei dataset, you can modify conf and hparams.py to fit your own dataset and model.

Usage

Prepare data
- write path of wav data in a file, for example: cd dataset && python3 biaobei.py
- bash preprocess.sh <wav path file> <path to save processed data> dataset/audio dataset/mel
- for example: bash preprocess.sh dataset/BZNSYP.txt processed dataset/audio dataset/mel

Train

command:

bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    <if multi band> \
    <if use scheduler> \
    <path to configuration file>

for example:

bash train.sh \
0 \
dataset/audio/train \
dataset/audio/valid \
dataset/mel/train \
dataset/mel/valid \
hifigan \
0 0 0 \
conf/hifigan/light.yaml

Train from checkpoint

command:

bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    <if multi band> \
    <if use scheduler> \
    <path to configuration file> \
    /path/to/checkpoint \
    <step of checkpoint>

Synthesize

command:

bash synthesize.sh \
    /path/to/checkpoint \
    /path/to/mel \
    /path/for/saving/wav \
    <model name> \
    /path/to/configuration/file

Acknowledgments

Comments

why set the L=30 ?

hello，I have some question， in the paper ，the shape of basis matrix is [32, 256] , but in the code ,the shape is [30, 256] . And according to the function "overlap_and_add" , output_size = (frames - 1) * frame_step + frame_length, if the L=30, I think it cannot match the real wave length ? for example, hop_len=256, mel.shape=[80, 140] , theoretically the output wave length is 140*256=35840. according to the code, the output wave length is 33600.

Thanks in advance.

opened by yingfenging 3
Link to Basis-MelGAN paper?

Hi Zhengxi, congrats on your paper's acceptance on Interspeech 2021!

I got pretty interested in your paper while reading the abstract of Basis-MelGAN on the README, but I could not find any link to the paper. Though the Interspeech conference is only 2 months away, don't you have any plans on publishing the paper on arXiv in near future?

opened by seungwonpark 2
Random start index in WeightDataset

At this line: https://github.com/xcmyz/FastVocoder/blob/a9af370be896b1096e746ce6489fb16fef8ca585/data/dataset.py#L97

If the input mel size smaller than fix-length, the random raise issue, I have try except to pass these short audios, but I just wonder it is handle in collate.

More than that, the segment size as I found in hifigan is 32, but in basic-melgan it (fix-length) is set to 140. Are there any difference between the 140 for biaobei and the one for LJspeech

opened by v-nhandt21 0
can basis-melgan be used as unversial vocoder?

I tried it for a single speaker dataset, rtf surprises me. Have you ever use basis-melgan for a multi-speaker dataset, or is it suitable for unseen speaker tts synthesis?

opened by mayfool 0
Shape mismatch error on new dataset
Hi, thanks for your work!

The frame rate of my dataset is 22050, and hop size of text2mel model is 256. I have changed hparams.py accordingly, but training results in an expcetion: (preprocessing was fine, anyway)

File "/home/user/speechlab/FastVocoder-main/model/loss/loss.py", line 23, in forward assert est_source_sub_band.size(1) == wav_sub_band.size(1)

I figured out that model inference still uses hop-size of 240. So how to make your code fully compatible with other datasets? it seems that the codes are somehow hardcoded for Biaobei dataset.
opened by tekinek 1
Multiband Architecture

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks.
help wanted

opened by Rongjiehuang 6

Releases(v1.0)

v1.0(Jun 24, 2021)

Source code(tar.gz)
Source code(zip)
basis.melgan.pt(53.36 MB)

Owner

Zhengxi Liu (刘正曦)

Interested in high performance neural vocoder and expressive TTS acoustic model. Member of DeepMist and developed MistGPU.

GitHub Repository

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 04, 2022

Document processing using transformers

Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke

13 Dec 21, 2022

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Neural Network Models for Joint POS Tagging and Dependency Parsing Implementations of joint models for POS tagging and dependency parsing, as describe

152 Sep 02, 2022

⚖️ A Statutory Article Retrieval Dataset in French.

A Statutory Article Retrieval Dataset in French This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code

19 Nov 17, 2022

Input english text, then translate it between languages n times using the Deep Translator Python Library.

mass-translator About Input english text, then translate it between languages n times using the Deep Translator Python Library. How to Use Install dep

2 Mar 04, 2022

translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

ZUNIT Dependencies you can install all the dependencies by pip install -r requirements.txt Datasets Download CUB dataset. Unzip the birds.zip at ./da

9 Jun 24, 2022

PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

PRAnCER (Platform enabling Rapid Annotation for Clinical Entity Recognition) is a web platform that enables the rapid annotation of medical terms within clinical notes. A user can highlight spans of

39 Nov 14, 2022

A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can

10 Jul 17, 2022

A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

0 Nov 15, 2022

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

44 Jan 06, 2023

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

2.1k Jan 01, 2023

Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

358 Dec 24, 2022

Pretty-doc - Composable text objects with python

pretty-doc from __future__ import annotations from dataclasses import dataclass

2 Jan 17, 2022

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

3 Jun 22, 2022

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Related tags

Overview

Fast (GAN Based Neural) Vocoder

Todo

Discription

Usage

Acknowledgments

Comments

why set the L=30 ?

Link to Basis-MelGAN paper?

Random start index in WeightDataset

can basis-melgan be used as unversial vocoder?

Shape mismatch error on new dataset

Multiband Architecture