Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Last update: Dec 07, 2022

Related tags

Deep Learning WadaIN-VC

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

python 3.6.0
pytorch 1.4.0
pyyaml 5.4.1
numpy 1.19.5
librosa 0.8.0
soundfile 0.10.2
tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.
If you wanna run our project, please install as the description of ParallelWaveGAN project first.
And then prepare all the mel-spectrogram data as ParallelWaveGAN do.
Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.
Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

The world's largest toxicity dataset.

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Yolact-keras实例分割模型在keras当中的实现

Learn other languages using artificial intelligence with python.

Face Recognition & AI Based Smart Attendance Monitoring System.

[Machine Learning Engineer Basic Guide] 부스트캠프 AI Tech - Product Serving 자료

Study of human inductive biases in CNNs and Transformers.

Implementing DeepMind's Fast Reinforcement Learning paper

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

Self-Supervised Learning with Kernel Dependence Maximization

Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Python Interview Questions

Code for SALT: Stackelberg Adversarial Regularization, EMNLP 2021.

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

A curated list of programmatic weak supervision papers and resources

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

3D ResNet Video Classification accelerated by TensorRT

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

The world's largest toxicity dataset.

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Yolact-keras实例分割模型在keras当中的实现

Learn other languages ​​using artificial intelligence with python.

Face Recognition & AI Based Smart Attendance Monitoring System.

[Machine Learning Engineer Basic Guide] 부스트캠프 AI Tech - Product Serving 자료

Study of human inductive biases in CNNs and Transformers.

Implementing DeepMind's Fast Reinforcement Learning paper

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

Self-Supervised Learning with Kernel Dependence Maximization

Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Python Interview Questions

Code for SALT: Stackelberg Adversarial Regularization, EMNLP 2021.

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

A curated list of programmatic weak supervision papers and resources

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

3D ResNet Video Classification accelerated by TensorRT

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Learn other languages using artificial intelligence with python.